Are there any approximations of critical values for non-parametric tests (like sign test, K-S test, Friedman test, etc.)?

Fariborz Hamidi

Estimators and Statistical Tests

TODO: give the structure of this chapter

motivation of statistical tests

notion of estimator, bias, MSE

...

MLE (should be in a section of its own)

Bayesian methods (in a section of their own)

(TODO)

TODO: list the more important tests (Student and Chi^2)

Student T test: compare a mean with a given number

compare the mean in two samples

There are generalizations for more than two samples

(analysis of variance) and for non-gaussian samples

(Wilcoxon).

One can devise similar tests to compare the variance of

a sample with a given number or to compare the variance

of two samples.

Chi^2 test: to compare the distribution of a qualitative

variable with predetermined values, to compare the

distribution of a qualitative variable in two

samples. One can also use it to check if two qualitative

variables are independant. However, it is only an

approximation, valid for large samples (more than 100

observations, more that 10 observations per class).

TODO: check if we can do without the Chi^2 test:

- binary variable: bimom.test

- multinomial test: ???

- Independance Chi^2: fisher.test

- two variables: fisher.test

* Introduction to statistical tests: TODO: REWRITE THIS SECTION

We want to answer a question of the kind "Does tobacco

increase the risk of cancer?", "Does the proximity of a

nuclear waste reprocessing plant increase the risk of

leukemia?", "Is the mean of the population from which this

sample was drawn zero, given that the sample mean is 0.02?"

Let us detail the problem "Have those two samples the same

mean?" (it is a simplification of the problem "Do those two

samples come from the same population?").

Let us consider a first population, on which is defined a

statistical variable (with a gaussian distribution), from

which we get a sample. We do the same for a second

population, with the same population mean.

We can then consider the statistical variable

sample mean in the first sample - sample mean in the

second sample

and find its distribution.

If we measure a certain value of this difference, we can

compute the probability of obtaining a difference at least

as large.

P( difference > observed difference ) < alpha,

(for a given value of alpha, say 0.05), we reject the

hypothesis "the two means are equal", with a risk equal to

alpha.

But beware, this result is not certain at all. There can be

two kinds of error: either wrongly clain that they are

different (this happens with a probability alpha) or wrongly

claim that the two means are equal.

Beware again, those tests are only valid under certain

conditions (gaussian variables, same variance, etc.).

If we really wish to be rigorous, we do not consider a

single hypothesis, but two: for instamce "the means are

equal" and "the means are different"; or "the means are

equal" and "the first mean is larger than the second". We

would use the second formulation if we can a priori reject

the fact that the first mean is lower than the second -- but

this has to come from information independant from the

samples at hand.

The statistical tests will never tell "the hypothesis is

true": they will merely reject or fail to reject the

hypothesis stating "there is nothing significant". (This is

very similar to the development of science as explained by

K. Popper: we never prove that something is true, we merely

continuously try to prove it wrong and fail to do so.)

+ H0 (null hypothesis) and H1 (alternative hypothesis)

Let us consider two hypotheses: the null hypothesis H0,

"there is no noticeable effect" (for instance, "tobacco does

not increas the risk of cancer", the proximity of a waste

recycling plant does not increas the risk of leukemia)

and the alternative hypothesis H1, "there is a noticeable

effect" (e.g., "tobacco increases the risk of cancer"). The

alternative hypothesis can be symetric ("tobacco increases

of decreases the risk of cancer") or not ("tobacco increases

the risk of cancer"). To choose an asymetric hypothesis

means that we reject, a priori, half of the hypothesis: it

can be a prejudice, so you should think carefully before

choosing an asymetric alternative hypothesis.

H0 is sometimes called the "conservative hypothesis",

because it is the hypothesis we keep if the results of the

test are not conclusive.

+ Type I error

To wrongly reject the null hypothesis (i.e., to wrongly

conclude "there is an effet" or "there is a noticeable

difference").

For instance, if the variable X follows a gaussian

distribution, we expect to get values "in the middle" of the

bell-shaped curve. If we get extreme values, we shall

reject, sometimes wrongly, the null hypothesis (that the

mean is actually zero). The type I error corresponds to the

red part in the following plot.

colorie

Source of information on historical development of Statistics subject (Timeline)?

How might I obtain sum of squares in ANOVA table of Mixed models in SPSS

How can I cluster the single variable data?

How many plants per plot is required under a split plot design?

How to approximate the cdf of t-distribution efficiently?

What is the bivariate distribution of (Y,X) where Y=X+(2U-1)C?

Are the neural network models sub-optimal?

How to learn more about SPSS and its Application?

Can I base on reverse DNA sequences to perform alignment, convert to amino acids and GenBank submission?

Baseline drift in HPLC? What causes this?

Text-Communication from the M1 Hand Area using BCI—and then there is Elon Musk?

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

How can I use the cif data obtained from rietveld refinement extracted via gsas2, for microstructural analysis using ETEX software?

Ethylene glycol is newtonian or non newtonian fluid?

How are iso-frequency contours plotted?

How to prepare the nanoparticle treated fungal sample for Environmental SEM analysis?

How to normalize and take the significance of the MTT OD values with 3 replicates for the same cell-line?