Please share any approximations available in literature to approximate the either p-values or critical values of non-parametric tests.
Estimators and Statistical Tests
TODO: give the structure of this chapter
motivation of statistical tests
notion of estimator, bias, MSE
...
MLE (should be in a section of its own)
Bayesian methods (in a section of their own)
(TODO)
TODO: list the more important tests (Student and Chi^2)
Student T test: compare a mean with a given number
compare the mean in two samples
There are generalizations for more than two samples
(analysis of variance) and for non-gaussian samples
(Wilcoxon).
One can devise similar tests to compare the variance of
a sample with a given number or to compare the variance
of two samples.
Chi^2 test: to compare the distribution of a qualitative
variable with predetermined values, to compare the
distribution of a qualitative variable in two
samples. One can also use it to check if two qualitative
variables are independant. However, it is only an
approximation, valid for large samples (more than 100
observations, more that 10 observations per class).
TODO: check if we can do without the Chi^2 test:
- binary variable: bimom.test
- multinomial test: ???
- Independance Chi^2: fisher.test
- two variables: fisher.test
* Introduction to statistical tests: TODO: REWRITE THIS SECTION
We want to answer a question of the kind "Does tobacco
increase the risk of cancer?", "Does the proximity of a
nuclear waste reprocessing plant increase the risk of
leukemia?", "Is the mean of the population from which this
sample was drawn zero, given that the sample mean is 0.02?"
Let us detail the problem "Have those two samples the same
mean?" (it is a simplification of the problem "Do those two
samples come from the same population?").
Let us consider a first population, on which is defined a
statistical variable (with a gaussian distribution), from
which we get a sample. We do the same for a second
population, with the same population mean.
We can then consider the statistical variable
sample mean in the first sample - sample mean in the
second sample
and find its distribution.
If we measure a certain value of this difference, we can
compute the probability of obtaining a difference at least
as large.
If
P( difference > observed difference ) < alpha,
(for a given value of alpha, say 0.05), we reject the
hypothesis "the two means are equal", with a risk equal to
alpha.
But beware, this result is not certain at all. There can be
two kinds of error: either wrongly clain that they are
different (this happens with a probability alpha) or wrongly
claim that the two means are equal.
Beware again, those tests are only valid under certain
conditions (gaussian variables, same variance, etc.).
If we really wish to be rigorous, we do not consider a
single hypothesis, but two: for instamce "the means are
equal" and "the means are different"; or "the means are
equal" and "the first mean is larger than the second". We
would use the second formulation if we can a priori reject
the fact that the first mean is lower than the second -- but
this has to come from information independant from the
samples at hand.
The statistical tests will never tell "the hypothesis is
true": they will merely reject or fail to reject the
hypothesis stating "there is nothing significant". (This is
very similar to the development of science as explained by
K. Popper: we never prove that something is true, we merely
continuously try to prove it wrong and fail to do so.)
+ H0 (null hypothesis) and H1 (alternative hypothesis)
Let us consider two hypotheses: the null hypothesis H0,
"there is no noticeable effect" (for instance, "tobacco does
not increas the risk of cancer", the proximity of a waste
recycling plant does not increas the risk of leukemia)
and the alternative hypothesis H1, "there is a noticeable
effect" (e.g., "tobacco increases the risk of cancer"). The
alternative hypothesis can be symetric ("tobacco increases
of decreases the risk of cancer") or not ("tobacco increases
the risk of cancer"). To choose an asymetric hypothesis
means that we reject, a priori, half of the hypothesis: it
can be a prejudice, so you should think carefully before
choosing an asymetric alternative hypothesis.
H0 is sometimes called the "conservative hypothesis",
because it is the hypothesis we keep if the results of the
test are not conclusive.
+ Type I error
To wrongly reject the null hypothesis (i.e., to wrongly
conclude "there is an effet" or "there is a noticeable
difference").
For instance, if the variable X follows a gaussian
distribution, we expect to get values "in the middle" of the
bell-shaped curve. If we get extreme values, we shall
reject, sometimes wrongly, the null hypothesis (that the
mean is actually zero). The type I error corresponds to the
red part in the following plot.
%G
colorie
Dear Hamidi, just I wanted to know on approximation functions for the critical values of non parametric tests, but you have shared a good reading material. Thanks
Could you suggest me in sharing historical development of statistics subject, especially in Medicine and Epidemiology. There are many methods are proposed, but few only popularized. Is there any...
08 September 2015 2,455 2 View
Please guide me on how can I get the sum of squares of a cluster randomization trial when the data analyzed using Mixed models option in SPSS. Mixed model option in SPSS is giving only F values...
06 July 2015 7,603 4 View
Please explain how to cluster the single variable data using minitab software. Please share if any algorithm is available for making similar groups from a set of data. Let the data is, A, B, C, D,...
31 December 2014 7,376 7 View
We are planning some Multi-location trials (MLTs). In which, in one location has limited field area, so we reduced the number plants per plot as compared with other locations. Is it possible to...
06 July 2014 3,457 4 View
Approximation of t distribution is essential for finding the p-value in a computer program (while testing the hypothesis about the means). Is it enough to have three decimal point accuracy of the...
02 March 2014 2,992 9 View
X follows normal distribution with mean mu and variance sigma square and U follows U(0 1) distribution. They are independent of Y where Y=X+(2U-1)C where C is a constant and a function of standard...
02 March 2014 6,950 5 View
Neural networks may provide the single platform for many statistical applications like time series forecasting, pattern recognition, classification, function approximation etc. but some...
02 March 2014 3,682 3 View
Hiiiii everyone! I have an enquiry on statistical analysis. I was looking for many forum and it's still cannot solve my problem. I want to compare means of two groups of data but only with two...
03 March 2021 8,796 3 View
I am on the lookout for the Enhanced Yellow Fluorescent Protein (Aequorea victoria) DNA sequence. Does anyone know where I can find it? Thank you in advance
03 March 2021 3,568 1 View
Hi, I want to start testing pitfall trap to obtain ants samples, but I need to conduct molecular analysis on those insects. So, what kind of fluid can I use? Ethanol expires too early and I need...
03 March 2021 5,978 5 View
What's the best way to measure growth rates in House sparrow chicks from day 2 to day 10? Since, the growth curve from day 2 to 10 won't be like the "Logistic curve" it might not follow logistic...
03 March 2021 1,401 3 View
I have conducted and published a systematic review and meta-analysis research with the topic related to public health and health pomotion (protocol was registed in PROSPERO). Now we want to...
03 March 2021 8,920 3 View
dear community, my model is based feature extraction from non stationary signals using discrete Wavelet Transform and then using statistical features then machine learning classifiers in order to...
03 March 2021 6,994 5 View
I just wanted to check if I need to run a linear regression separately if I am using PROCESS MACRO to run mediation analysis. Thank you.
02 March 2021 4,359 3 View
If the detection range is in ng/ml but the reference range is in ug/ml for a molecule or protein in serum or plasma .how to dilute and what is the initial volume to be taken for quantitative analysis
02 March 2021 7,670 3 View
Is There Any Feasible Method To Test The Efficiency Of Fluorescent Compounds Other Than UV Spectrometers ? Suggestions Would Be Appreciated !
02 March 2021 5,785 3 View
I am wanting to calculate the average trend in maximum annual NDVI in Iceland from 2010-2020 using MODIS MYD13Q1 V6. How would I do this? I have currently inserted the NDVI bands from the MODIS...
02 March 2021 752 2 View