Analysis of Variance, ANOVA tests whether the means of independent or unrelated groups or treatments are statistically significant. An experiment is assigned factors with different levels, for instance, an experiment with one treatment and a control has one factor (the treatment) and two levels (the treatment and the control). In this case, only one independent variable is manipulated.
ANOVA divides the observed variance into random and systematic variances (Zar, 2009):
Random variance is attributed to experimental manipulation and is the explained from sum of squares between the groups, which is SS (between) or SSX.
Systematic variance is the variance unrelated to X, hence unaccounted for by the sum of squares related to X. It is attributed to sum of squares within the groups; hence SSE or SS (within).
Both random and systematic variances contribute to the total variance, SST. Whereas the random error has no statistical influence on the data, systematic error has strong statistical influence on the data.
F-stat is the ratio between SSX and SSE, and we use the degrees of freedom associated with each variance to find the mean squared error. A high F ratio shows a significant likelihood (and a low probability, p < 0.05), that the observed mean differences were not obtained by chance. Hence the data sampled from different populations had different means, or because the independent treatments on the response variable had a significant effect.
Nevertheless, ANOVA is also described as a method of statistical modeling which is rarely used by researchers. Is there any creative and dynamic way this method can be used to generate new data and make predictions on observations outside the datasets?
Reference
Zar, J. H. (2010). Biostatistical Analysis, Fifth edition ISBN: 0131008463