That's why I ask this question. Suppose that there is adequate statistical model. But its results are acceptable only for selected time intervals and events. I understand that need to continue to build another model. All this is well explained when the already existing situation. And what about the prediction.
That's why I ask this question. Suppose that there is adequate statistical model. But its results are acceptable only for selected time intervals and events. I understand that need to continue to build another model. All this is well explained when the already existing situation. And what about the prediction.
What ever we "know" depends on our experiences (observations). And it can only be related to the frame in which the observations were made. If this frame comprises a short period of time, extrapolations in time might be grossly wrong, also any other exptrapolation (relatively far) beyond the frame or scope the observations are based on. Good models will account for the increasing uncertainty regarding extrapolations. And, yes, all knowledge based on observations is uncertain, and uncertainty is numerically treated as probabilities. Thus I'd say that any empirically based knowledge is "probabalistic". What caveats there might be, what things might have to be considered, and how helpful a model is, is not and can not be answered by statistics. Statistics help to find out how good data fit to a given model - but statistics does not recommend a model, not can it judge if a model is good or helpful or reasonable for its purpose.
@Vyacheslav, I do use statistical methods in the area of stochastic systems as well as optimal systems! There is no deterministic, but we can speak only about stochastic systems!
It is a big discussion. We use statistical methods generally in next two cases:
1)When our problem is of a deterministic nature, but we want to get some estimations, predictions, confidence intervals and other helpful issues.
2)When our problem is a pure stochastic one, so we cannot explain it with traditional causality deterministic methods. Then statistical methods is a 'must'.
But as everything in life our real problem in the beginning of the day does not belong neither to 1) nor to 2) case above, so situation is getting a little bit complex...
That's the reason why statistics in general is a generator of big debates among scientists and I think this is an internal and very serious problem of it.
Right now, I am working on a project where I use Markov Matrices to simulate chemical kinetics. I also write a blog post for the Detroit ACS group on statistical applications in Chemistry. If you take some classes in applied statistics, like Design of Experiments and Advanced Design of Experiments, you realize that most of hte stats that appear in most (99%+) of journal articles in all journals are really poorly done. The topics in Design of Experiments and Advanced Design of Experiments were essentially MADE for scientists by chemists and biologists. Over the years they have been refined by statisticians.
When applied properly, Design of Experiment methods can help every scientists get better results, faster, cheaper and more robust.
Dear @Vyacheslav, Statistics calculates numbers and numbers can not be wrong. However, in order to get the true conclusions, these calculations should be done, applied, and read properly.
Accepting that whole research careers might be devoted to Statistics alone, and journals only devoted to Statistics continuously produce new publications, how can a student, scientist or teacher from any research domain become expert in statistical analyses?
Example:
- If one would ask 100 people with different scientific backgrounds to analyze the same data set to test hypothesis 'A' or describe pattern 'B', details in statistical procedures influencing results and conclusions might differ among data analyzers.
- Starting from a single data set, how many statistical analyses will be tried out to look for interesting patterns, and how much of these statistical trials will end up in the methods sections of publications? A significant mismatch between unpublished and published statistical analyses involving the same data set?
If every scientist took a class on regression analysis and a class on design of experiments, from an industrial engineering or statistics department, the entire world of scientific research would be far better off.
I have taken over a dozen stats classes. Between my first regression analysis class and my first DOE class, I have applied the ideas from those classes in all of my other stats classes.
Despite what many scientists learned, myself included in UG physics and GR Env Sci and Chem, one should change more than one thing at a time. Yes, my advisers were wrong.
Optimal Factorial Designs and Optimal Response Surfaces will speed up scientific research. With an Optimal Factorial Design, I can create a test for 6 factors with 2-levels each, (just like a t-test) in 12 runs. My 12 run design, a Plackett Burman Design, has more power, accuracy and information gain (up to 11 pieces), than the t-tests most scientists will use in their research. I need a total of 12 runs. That's it. The t-test approach with 3 replicates per group uses 36 samples. The t-test approach will give a max of 6 pieces.
The t-tests, under the same conditions as the Plackett Burman Design, have far less power. Power is a measure of the researchers ability to to make the correct conclusion, see or find a true difference.
Data sets from well-planned (field) experiments in well-studied model species probably require more simple statistics also facilitating data interpretation.
Imagine two persons: Person A is top specialist in statistics but does not know model species X versus person B is top specialist in statistics and has much background information concerning model species X. Will person A and B use the same experimental design when they want to test hypothesis X in model species X? In how many research environments, scientists represent person A or person B?
That is a major issue in statistics. I started a statistical consulting group at one of my universities to help my stats classmates get an idea about what you can do with stats and how to design good experiments. I focus a lot of my efforts in designing better experiments than using t-test based methods.
I have a bachelor's degree in physics and minors in chemistry and biology. I started off in a graduate program in Environmental Science. At that time, I had taken 4 stat classes. With that minimal amount of knowledge, I was able to read through the experimental sections of the journal articles we read and find dozens of flaws. My profs got so fed up with me quoting stats textbooks when discussing how the analyses were improper, at best, and very wrong at worst, that they stopped discussing articles.
When it comes to designing an experiment from the perspective of someone like myself, who has a background outside of stats, I ask a lot more questions than my stats classmates. I know about the level of statistical knowledge most scientists have. My classmates tend to over-estimate the scientists statistical knowledge. As a results, my classmates will design an experiment to model the species you talk about, under the assumption that the scientists already looked at other possible models and already know what has an effect and what does not. I don't make those assumptions.
Something else to keep in mind. If you want to test the effect of more than a control group and a single experimental group, i.e. you are using only one t-test for your analysis, you need to use some type of a factorial designed experiment. The analysis of such designs is not very simple. In general, it's not something you can get Excel or a calculator to do. You need to use statistical software.
If you use multiple t-tests, you get into issues of Family Wise Error Rates and the possibility of bogus results. An article that got me in big trouble with my Plant Ecology prof dealt with the sex ratios of a certain type of plant in 8 different plots of land. The authors looked at N, P, K, light, precip, site placement, and year. The author used several, simple linear regression analyses for each variable of interest. They made claims based upon these simple models. many of the claims made, were later found to be false. When you use multiple linear regression methods on the data, you find that the only significant variable was year. Nothing else mattered. This was a widely held conclusion from other, larger, better analysed studies. This was also the only claim in this paper that was "true". Multiple Linear Regression is the statistical method used for analysing factorial designed experiments.
See, for example, Section 3.1, on exceptional cases.
A great, detailed overview of statistical methods in atmospheric sciences is given in
D, Wilks, Statitical Methods in the Atmospheric Sciences, 2nd Ed., 2006:
http://web.unbc.ca/~ytang/text_book.pdf
In this very helpful book, see, for example, Section 3.3 on Graphical Summary Techniques starting on page 28. You may find the stem and leaf display section,
If the experimental data are collected from a well designed and controlled experiments with bigger n values usually that gives us confidence on the analyzed data. Most often statistical analysis is telling us the truth, that either supports or confront our hypothesis and the failure of the successful utilization and application of this knowledge reflects either the hidden problems associated in designing experiments or there could be potential chances for either underestimating or excluding other interactive parameters in the concerned experiments.
The implementation of the knowledge derived from statistically analyzed data obtained from experiments with animal models are always better or close to be replicated in human subjects compared to the cellular or organ cultured models.
Statistical methods are used to analyze quantitative results of scientific research. Correctly conducted statistical methods used allow for objective processing of large sets of quantitative data, thanks to which research theses are verified. The use of ICT information technology for the needs of statistical analytics allows for automation, standardization, objectification of carried out verification processes of quantitative data describing complex processes, etc. and reduction of the costs of analyzes. The computerized processes of applying statistical methods to analyze large quantitative data sets have enabled the development of analytics based on the application of advanced data processing technology Industry 4.0, including data processing in the cloud, in Big Data database systems, the use of Business Intelligence analytical platforms, the Internet of Things technology, learning machines, artificial intelligence, etc. Without statistical methods, the effective development of advanced analytics, Data Analytics, Data Science, etc., was not possible. Therefore, without the use of statistical methods in the verification and analytical processes of large quantitative data sets, the development of scientific research would be much slower than it proceeds.
Statistical analysis experts help collect, study and extract relevant information from vast and complex data. This information is then applied to validate and further research, make sound business decisions and drive public initiatives.
Here are the top 6 applications of statistical analysis: