Please share any approximations available in literature to approximate the either p-values or critical values of non-parametric tests.
Estimators and Statistical Tests
TODO: give the structure of this chapter
motivation of statistical tests
notion of estimator, bias, MSE
...
MLE (should be in a section of its own)
Bayesian methods (in a section of their own)
(TODO)
TODO: list the more important tests (Student and Chi^2)
Student T test: compare a mean with a given number
compare the mean in two samples
There are generalizations for more than two samples
(analysis of variance) and for non-gaussian samples
(Wilcoxon).
One can devise similar tests to compare the variance of
a sample with a given number or to compare the variance
of two samples.
Chi^2 test: to compare the distribution of a qualitative
variable with predetermined values, to compare the
distribution of a qualitative variable in two
samples. One can also use it to check if two qualitative
variables are independant. However, it is only an
approximation, valid for large samples (more than 100
observations, more that 10 observations per class).
TODO: check if we can do without the Chi^2 test:
- binary variable: bimom.test
- multinomial test: ???
- Independance Chi^2: fisher.test
- two variables: fisher.test
* Introduction to statistical tests: TODO: REWRITE THIS SECTION
We want to answer a question of the kind "Does tobacco
increase the risk of cancer?", "Does the proximity of a
nuclear waste reprocessing plant increase the risk of
leukemia?", "Is the mean of the population from which this
sample was drawn zero, given that the sample mean is 0.02?"
Let us detail the problem "Have those two samples the same
mean?" (it is a simplification of the problem "Do those two
samples come from the same population?").
Let us consider a first population, on which is defined a
statistical variable (with a gaussian distribution), from
which we get a sample. We do the same for a second
population, with the same population mean.
We can then consider the statistical variable
sample mean in the first sample - sample mean in the
second sample
and find its distribution.
If we measure a certain value of this difference, we can
compute the probability of obtaining a difference at least
as large.
If
P( difference > observed difference ) < alpha,
(for a given value of alpha, say 0.05), we reject the
hypothesis "the two means are equal", with a risk equal to
alpha.
But beware, this result is not certain at all. There can be
two kinds of error: either wrongly clain that they are
different (this happens with a probability alpha) or wrongly
claim that the two means are equal.
Beware again, those tests are only valid under certain
conditions (gaussian variables, same variance, etc.).
If we really wish to be rigorous, we do not consider a
single hypothesis, but two: for instamce "the means are
equal" and "the means are different"; or "the means are
equal" and "the first mean is larger than the second". We
would use the second formulation if we can a priori reject
the fact that the first mean is lower than the second -- but
this has to come from information independant from the
samples at hand.
The statistical tests will never tell "the hypothesis is
true": they will merely reject or fail to reject the
hypothesis stating "there is nothing significant". (This is
very similar to the development of science as explained by
K. Popper: we never prove that something is true, we merely
continuously try to prove it wrong and fail to do so.)
+ H0 (null hypothesis) and H1 (alternative hypothesis)
Let us consider two hypotheses: the null hypothesis H0,
"there is no noticeable effect" (for instance, "tobacco does
not increas the risk of cancer", the proximity of a waste
recycling plant does not increas the risk of leukemia)
and the alternative hypothesis H1, "there is a noticeable
effect" (e.g., "tobacco increases the risk of cancer"). The
alternative hypothesis can be symetric ("tobacco increases
of decreases the risk of cancer") or not ("tobacco increases
the risk of cancer"). To choose an asymetric hypothesis
means that we reject, a priori, half of the hypothesis: it
can be a prejudice, so you should think carefully before
choosing an asymetric alternative hypothesis.
H0 is sometimes called the "conservative hypothesis",
because it is the hypothesis we keep if the results of the
test are not conclusive.
+ Type I error
To wrongly reject the null hypothesis (i.e., to wrongly
conclude "there is an effet" or "there is a noticeable
difference").
For instance, if the variable X follows a gaussian
distribution, we expect to get values "in the middle" of the
bell-shaped curve. If we get extreme values, we shall
reject, sometimes wrongly, the null hypothesis (that the
mean is actually zero). The type I error corresponds to the
red part in the following plot.
%G
colorie
Dear Hamidi, just I wanted to know on approximation functions for the critical values of non parametric tests, but you have shared a good reading material. Thanks
Could you suggest me in sharing historical development of statistics subject, especially in Medicine and Epidemiology. There are many methods are proposed, but few only popularized. Is there any...
08 September 2015 2,584 2 View
Please guide me on how can I get the sum of squares of a cluster randomization trial when the data analyzed using Mixed models option in SPSS. Mixed model option in SPSS is giving only F values...
06 July 2015 7,742 4 View
Please explain how to cluster the single variable data using minitab software. Please share if any algorithm is available for making similar groups from a set of data. Let the data is, A, B, C, D,...
31 December 2014 7,605 7 View
We are planning some Multi-location trials (MLTs). In which, in one location has limited field area, so we reduced the number plants per plot as compared with other locations. Is it possible to...
06 July 2014 3,650 4 View
Approximation of t distribution is essential for finding the p-value in a computer program (while testing the hypothesis about the means). Is it enough to have three decimal point accuracy of the...
02 March 2014 3,194 9 View
X follows normal distribution with mean mu and variance sigma square and U follows U(0 1) distribution. They are independent of Y where Y=X+(2U-1)C where C is a constant and a function of standard...
02 March 2014 7,083 5 View
Neural networks may provide the single platform for many statistical applications like time series forecasting, pattern recognition, classification, function approximation etc. but some...
02 March 2014 3,822 3 View
I would like to learn more about SPSS and Its application especially in regards to data analysis. Please suggest me how I can learn more about it. Thank you so much.
11 August 2024 9,101 4 View
I have reverse sequences (AB1 format), can I base on reverse DNA sequences to perform nucleotide alignment, convert nucleotides to amino acids and deposit the sequence in GenBank database?
11 August 2024 5,138 1 View
Hello, Why do i see this baseline drift when i compare my blank (black) to the sample (blue)? Any suggestions as to why this happened? Thank you!
11 August 2024 3,770 4 View
Willett, Shenoy et al. (2021) have developed a brain computer interface (BCI) that used neural signal collected from the hand area of the motor cortex (area M1) of a paralyzed patient. The...
10 August 2024 7,180 0 View
I'm currently exploring the application of Python in textile engineering, specifically in areas like data analysis, process automation, and the development of smart textiles. I'm interested in...
10 August 2024 7,429 2 View
How can I use the cif data obtained from rietveld refinement extracted via gsas2, for microstructural analysis using ETEX software?
09 August 2024 7,718 0 View
Some sources say it is treated as Newtonian as ethylene glycol has a higher viscosity than water, which affects the flow characteristics in simulations. But some also say it is non-Newtonian.
09 August 2024 2,111 2 View
Let's say we have a standard, regular hexagonal honeycomb with a 3-arm primitive unit cell (something like the figure attached; the figure is only representative and not drawn to scale). The...
07 August 2024 1,937 1 View
A fungal strain was treated with nanoparticles. We want to do an environmental SEM analysis. So could anyone share your views on preparing the sample? Thank you.
07 August 2024 5,307 1 View
Hi, I have a question about normalizing the MTT OD values for doing the statistical analysis. So, if we have 3 different plates and we call them 3 different replicates, so, first we would...
07 August 2024 8,106 4 View