As we know, in common statistical practice and education 2 group tests are taught as separate tests than those for 3 groups. For example, a t test is used (and is taught to be used) when comparing two groups in terms of a continuous outcome, while ANOVA is suggested when one needs to compare a similar outcome in more than 2 groups. (I know they are indeed different). You can enumerate other examples in your mind.
This distinction seems unnecessary and frustrating to me (especially when teaching). Most (or all) of those standard tests which are taught to be used for 2+ groups are also validly applicable to 2 groups. So, why not just forget about all those 2 group tests and use those for 2+ groups, even when comparing 2 groups?
But a t-test is an anova. The likely answer will be: "you cannot use anova here because you have only two groups, so you should use a t-test". What should I say... one can ask the editor to involve a statistical reviewer and request the reviewers in general to not comment things that are obviousely out of their expertise... but most people would make their lives easy and simply substitute "anova" with "t-test" in the manuscript.
I think the problem starts where you talk about "(two or more) group tests". Test assess how different (usually nested) models perform in explaining the data. These models can be be different with respect to one or more coefficients. Testing a single coefficient allows us to see if the data allows us to interpret the sign of this coefficient. This is not possible if the models differ in more than one coefficient. A test on a group of more than one coefficient is anyway rarely meaningful or sensible, scientifically (still waiting to find a counter-example!).
F test in ANOVA is used for the purpose of multiple mean comparison to check if there is at least one group mean present, which differs significantly different from others. If result is found significant, then t-test has been performed to check which group mean is signifiactly from the others. Otherwise one have to conduct nc2 t-tests to check the significance of the diffrence. Whereas, in case of two groups, significance of difference between two group means can be checked directly by performing t-test. In one way, it can be said that two group tests are a particular case of more than two group tests.
Jochen, I didn't get what you mean.
I simply mean that a test for 2+ groups can easily be applied to comparing 2 groups. So tests taught to compare 2 groups seem redundant, or in other words, we can live without them and our lives will be easier.
I tries to say that we shouldn't start teaching "group tests" in the first place. We should emphasize that we set up models that are able (to some degree) to explain some properties of the data. And that we may use tests to check the "statistical signal-to-noise ratio" in observed data w.r.t. to the restriction of one or more coefficients in that model.
I wonder what would be the response from reviewers, if instead of a t-test, I do an ANOVA in an article.
But a t-test is an anova. The likely answer will be: "you cannot use anova here because you have only two groups, so you should use a t-test". What should I say... one can ask the editor to involve a statistical reviewer and request the reviewers in general to not comment things that are obviousely out of their expertise... but most people would make their lives easy and simply substitute "anova" with "t-test" in the manuscript.
I absolutely agree to the use of ( more than 2 groups tests) in case we have 2 groups... specially if we have more than one variable, some of which are 2 levels while others are 3 or more levels and we want to put these variables in one table.. it's much easier in presentation and reporting and gives exactly the same results..
The t-test, and ANOVA, are OLS regressions. The t-test is simply an OLS regression in which there is a single binary predictor. As such, the t-test is a dead end if done in the classical way. If you do it was a regression, then you can add covariates.
I actually detest ANOVA models. Who wants to test a hypothesis that there is "some kinda difference between these groups" where the number of groups is greater than two? The classic case is using ANOVA to test the effect of smoking, coded as never/ex/current. You are better off thinking of this as two binary regressors:
Now we have a model we can interpret!
And, of course, this smoothly integrates with a logical teaching sequence. Logistic regression can be introduced as a clever response to the problem of a predicted variable that is bounded by zero and one. Poisson regression comes next in my course. But after that, you can make the ideas of link functions explicit, and there you are – GLM has arrived.
To my mind, the archaic terminology is a bewildering legacy from the early days. The concept of regression underlies everything, and is the logical one on which to base teaching.
Ronan, thanks for your excellent views. I actually follow and fully agree.
I need to clarify some points.
The reason I phrased in my question that "t-test and ANOVA are indeed different" was to eliminate possible future reactions. I meant they are different in mechanics, although they are actually parts of the same model and ANOVA being a superset of t-test.
Also, my question is not limited to the case of t-test and ANOVA. It also includes eg. Mann Whitney U and Kruskal Wallis or Wilcoxon signed rank and Friedman tests.
I was also talking about teaching to general audiences, not to specialized ones. For general audiences, it is very hard to teach the intrinsics of regression (or else) and neither is it necessary in my opinion. Therefore I do not prefer teaching it in detail, unless I would have to. I want to keep it so simple.
Mehmet Sinan Iyisoy , great topic.
* * *
A question for you: Have you you ever taught anova and t-test this way? That is, start off with the anova test, and then introduce t-test as a special case?
* * *
One point in favor of two-sample tests: Understanding effect sizes is a easier with the two sample case.
Cohen's d makes sense: It's just the difference in means divided by the pooled standard deviation. If the means differ by two standard deviations, Cohen's d is 2. For anova, understanding eta squared isn't too bad, though maybe not as intuitive as Cohen's d.
When it comes to the stochastic tests (Mann-Whitney and Kruskall-Wallis), the effect sizes for Mann-Whitney are understandable. Usually either Cliff's delta or Vargha and Delaney's A. Either one is related to the probability of an observation in one group being greater than an observation in another group. For VDA, that's exactly the interpretation. For Cliff's delta, it's linearly related to this.
Effect sizes for Kruskal-Wallis include Freeman's theta and something often called epilson-squared. There may be some easily understandable interpretations for these, but I don't know I could come up with one.
Salvatore S. Mangiafico no, I didn't try it that way. I can't see what I would benefit if I do so.
I am not sure if Cohen's d is more understandable to general audiences than eta squared or similar conceptually. The effect size may be quantified better if one adheres to Cohen's guidelines.
I told myself I wouldn't go down this rabbit hole...but here I am! (Thanks, Mehmet!)
My current view is that there (probably) are some benefits to starting with the two-group tests and then moving on to the more general procedures for 3+ groups, being sure to demonstrate that the former are special cases of the latter. For one thing, the two-group tests are often simpler conceptual, so easier to explain to students. For another thing, any of our students who read the literature are going to find numerous examples of those two-group tests (and other analyses that we might now thing of as special cases of the GLM or GLzM). Therefore, I think our students need to know about those tests.
Don't get me wrong. I am very much in favour of showing students that test A is really a special case of a more general procedure B. E.g., I ask students to perform an unpaired t-test, and then to estimate an OLS linear regression model with the same data (i.e., with one dichotomous explanatory variable). I then ask them to look at the constant and slope from the regression model and tell me where they see those same values in the t-test output. I ask them to examine the t-test on the slope, and tell me where they see that same t-test in the t-test output. Etc.
I suspect that most of us want our students to end up at the same place in terms of the knowledge they acquire. So I think that this discussion is more about the best route for getting there.
Finally, for those who do prefer to start with regression, the GLM, etc. here's a book you may find useful (depending on your discipline and the level of the course):
Cheers,
Bruce
I see two reasons:
1) pedagogical. Tests on coefficients on regression models are T-tests, and it's always better to start with simple cases. The most simple regression model is with a single binary predictor, and testing its coefficient is just what does the T-test teached in first curses (well, neglecting comparison to a theoretical mean, but that's quite similar). Remember your first statistical curse. Was it easy to understand the concept of test, rejection region, test criteria, hypotheses and so on. Now imagine if it was starting with general linear model regression, because after all almost all that is teached in introductory classes is a special case of general linear model regression... That's quite a huge step.
2) Unidirectional tests. ANOVA is technically a ratio of variances, so you cannot study if a mean *increases*, you can only test if a mean *changes*. ANOVA and *two-sided* T-test are equivalent on two groups, but not ANOVA and *one-sided* T-test. Same thing for test to compare two proportions and chi-square test for independence on a 2×2 contingency table.
Obviously, such directional tests can be done on coefficients of the underlying models, but in this case we're getting back on the point 1.
And for Ronán: there's at least two situations I can think of in which one is interested in a global change without investigating details:
- testing for batch effects (for a one-way ANOVA, the additional choice between fixed-effects or random-effects factor is not quite relevant)
- testing for genotype effect on a phenotype when there is no reason to expect a special model like dominant/recessive or linear link between number of mutations and phenotype. There's probably more, and in fact for chi-square test of independence, I'm quite sure investigation of the link is the exception rather than the case.
In general, whenever global test allows to replace a set of individual hypothesis (hence tests) by just one is also interesting at least for a first exploratory step, to limit issues of multiple testing.
Dear Emmanuel,
ad 1: I think it is the cause of the pedagocigal misery we have that stats courses start with (or built up to) the least interesting aspect of a statistical analysis: significance tests (and still celebrate the illogical mishmash of hypothesis tests and significance tests). We were led away from thinking about models first. How do we formalize the connection between observations (w.r.t. the functional and to the stochastic aspects)? How is information coded, obtained, and treated? How can observations build up "knowledge"? There are the most important question that we should ask and work on in introductory stats classes. Within this framework it is a technical excourse to see how one could assess the significance of an observed set of data under any particular restriction of a model.
I would be happy if you could give me any practically relevant example for the cases you enumerate (one-sided tests, testing "batch effects", tests for independence). I am aware that there are many theoretical examples one can think of, but since years I am looking for an example with a real practical relevance. All practical examples I know were finally conducted just because the authors followed "common practice" and never really used the information. Regarding one-sided tests: I don't get the difference of a one-sided test of size alpha and a two-sided on size alpha/2. If I anyway have to decide what type-I error rate I am willing to accept, why do I have to distinguish one- and two-sided tests instead of adusting the size according to the circumstances, requirements and aims of the test?
I also don't think that omnibus tests are helpful as a "first exploratory check". There is no limitation in calculation power today. If you want to control the FWER or the FDR - then just do it. It rarely costs more than 0.0002 seconds on an average laptop (to provide a made-up statistic ;)). And there is nothing to win from the omnibus tests - they don't help to keep an error rate if - and this refers to my previous paragraph - the ultimate aim is to get the significance of individual coefficients in the models.
Hi Jochen,
For point 1): don't let your Bayesian side take all the place ;) More seriously: I don't start my start courses with tests, for some of the reasons your invocating, but I neither think that tests are irrelevant (the fact that they are misused/overused is another debate). So the question you raised is, I think, more on « when to introduce tests » than « do we need to introduce them ». And I really think that tests are complicated enough, to introduce them in a very simple case, like T-tests (or other simple cases) without any need for additional complexity in that first presentation. Making tests correctly NEEDS to have understood the concept of modelisation and so on, so in my mind it can come only after the points you cite. The fact that some curse do not present that this way is not a reason to assume it is unneeded, or that all curses forget this point...
For one-sided tests: the game you're playing with alpha and sides is typically a misuse of tests, that does not help a correct usage of them. You should first set your hypothesis, and after that the risk. If the hypothesis you're interested in is an increase only, then your risk will be focused on this side. If it's a change, it will be splitted between the two sides. But overall will be the same. So when you are really interested in a change in a given direction, using a one-sided test allows to increase the power at a given Type I error. A typical application is the non-inferiority tests, where you test that a new antibiotic does not perform significantly worse than the current one (which in reality should be formulated as « perform significantly better than 90 % of the efficacy of the reference one », with 90 % being just an example here). And its generalization in equivalence tests, used for all generic treatments for instance... The fact that one-sided test limits can be determined from two-sided limits by changing the risk is just a computation convenience (and not so certain if you're considering discrete test criteria with asymetric distributions...), but playing with this especially with students is the best way to let them mix up concepts and think they can do anything with tests...
For batch effects: consider method validations processes where you want to check that results do not differ between methods and so on. If you want to use a test approach (which is debatable, but the point of the discussion was not about teaching/using tests at all or not, so that's not the point here), an omnibus test is relevant, and it is what was advised in some regulatory texts like the European Pharcomecopea (didn't read the last edition, so I don't know if it's still the case).
For screening tests: first, correcting for multiplicity, either with FDR or FWER lose power, and you may not want to pay that price when unneeded (I never mentioned computation time limitation...). Second, in a screening test, you may just want to detect candidates for a more targeted experiment that will investigate more in depth the coefficients of the model, so you're only interested in detecting a signal. Both points are typically relevant for large scale data, like SNP chips for instance.
Mehmet Sinan Iyisoy The t-test and anova and regression are not different in 'mechanics'. They all involve linear prediction based on a 1-unit increase in the predictor variable. So the easiest way to learn the t-test is as a special case of linear regression. I teach regression first, then introduce the t-test as a special case when the predictor is binary.
Ronan, could you please explain how one can perform a continuous by continuous OLS regression using the very mechanics of t-test or ANOVA? Not different in 'mechanics' should also mean you have the ability to do this, shouldn't it?
PS. What I mean by 'different in mechanics' is that the mathematical procedures they intrinsically use are different.
@ Mehmet: there are two ways to introduce 1-way ANOVA (and T-test as a special case of ANOVA). One is to decompose the total variance as a sum of two terms, and compare them. In this approach, the underlying model is a little bit masked (but not so much) and model coefficients are (apparently) unused. It has the advantage of being quite easy to understand for non-statisticians, non-mathematicians as close to real designs.
The other one is to say that ANOVA is in fact just a fit of a linear model with as much predictors as groups (or one less, depending on how it is written), typically (using R default coding) « Y = µ0 +d2 I2 + d3 I3 + ... + dk Ik + epsilon, with Ik the indicator of being in group k. Just as OLS, of which it is a special case/extension (as you like). In this approach, the sums of squares are seen as nested model comparison of two models, one with only an average mean and one with a different mean for each group. This is the one you want to answer your question: it is exactly the same than OLS !
[In fact, there is a third one using n-dimensional space geometry and projections, but it is the same again in both cases, and despite intellectually very nice, I find it difficult to understand for non-mathematician because n-dimensional spaces are a little bit abstract... so I do not present it here].
Both approaches are equivalent mathematically (for [ease of] interpretation, in complex cases, that's another debate).
So no, the mathematical procedures are not different at all, only the focus on how to apply them and which part is "useful" is (or may seem so).
@Emanuel, thanks for your reply. I am aware that ANOVA and t-test is a regression and they are equivalent. Or more precisely, GLM is a more general framework that also includes t-test and ANOVA.
But being equivalent does not imply being the same. T test and ANOVA has their own tools which can not be used (AFAIK) for a more general case of OLS regression (eg. a simple linear regression with a continuous predictor). Technically they have different emphasis, they use different tools, they are different frameworks.
Consider an independent samples t test by Gosset 1908. Why Gosset invented his t distribution instead of just taking the path of OLS which is dating back to 1800s? Similarly Fisher introduced the term "variance" (according to wikipedia) at a similar time and perhaps formed the foundations of ANOVA, again without considering to have a look at an older method (OLS).
If they had known OLS works for their cases, they wouldn't have tried to introduce these new methods. In my opinion, they even didn't know that what they found could be explained using OLS. If they had known what they actually is doing an OLS regression, I am sure they would be fascinated.
@ Mehmet : Yes, they started from a different point of view, but with increasing knowledge of the underlying processus, we now know that these methods are, technically, variants of a more general method, specially OLS (with multiple regressors for ANOVA).
Knowing that the technic is the same does obviously not mean that 1) history to construct them is the same, 2) that they are exchangeable, 3) that they answer the same practical question and 4) that their practical interpretation is the same.
What tool « specific for T-Test and ANOVA » do you think about, that does not apply to linear regression on a continuous data? Things like variance comparison between the groups to "ensure" homoskedasticity, or the fact that the model is by definition always correct, for instance?
The following parts from the book Applied Multiple Regression/Correlation Analyses for the Behavioural Sciences by Cohen,Cohen,West and Aiken might include some clues.
"Historically, MRC arose in the biological and behavioral sciences around 1900 in the study of the natural covariation of observed characteristics of samples of subjects, including Galton's studies of the relationship between the heights of fathers and sons and Pearson's and Yule's work on educational issues (Yule, 1911). Somewhat later, ANOVA/ANCOVA grew out of the analysis of agricultural data produced by the controlled variation of treatment conditions in manipulative experiments.
It is noteworthy that Fisher's initial statistical work in this area emphasized the multiple regression framework because of its generality (see Tatsuoka, 1993). However, multiple regression was often computationally intractable in the precomputer era: computations that take milliseconds by computer required weeks or even months to do by hand. This led Fisher to develop the computationally simpler, equal (or proportional) sample size ANOVA/ANCOVA model, which is particularly applicable to planned experiments. Thus multiple regression and ANOVA/ANCOVA approaches developed in parallel and, from the perspective of the substantive researchers who used them, largely independently."
(I don't have access to the reference Tatsuoka, 1993, therefore I couldn't have a look).
I believe that regression analyses made in those earlier days were different in terms of number of tests,comparisons etc than those we do now. Today's regression analyses are crowded with lots of tests, comparisons which are actually t and F tests (Should I say tools of ANOVA framework, or not?). This makes it difficult to understand what the real and old regression starts from and ends. If somebody can enlighten me on this, I would be grateful. (My idea was: the old regression did not include any t and F tests but only sum of squares.)
Emmanuel Curis t-test and ANOVA tools are applicable to regression on a continuous data. I was referring to the case when you apply t-test and ANOVA to a simple linear regression where both IV and DV are continuous.
BTW, my original post was not solely about t-tests and ANOVA, although RG's automatic system labelled them as such and I didn't touch. I think that's why most answers I received were about t-test and ANOVA.
@ Mehmet: I think the answer is more a problem of terms and definitions than of methods. T-test is applicable on coefficients of the linear regression, especially on the slope, and when the predictor is binary, this is exactly the "historical" T-test. So it depends what you are calling a T-test.
Similarly, decomposition of the total sum of squares in two terms works perfectly for linear regression, and the resulting analysis of variance table gives the test for the slope. So this is ANOVA — but not exactly the one-way, fixed effects ANOVA because for the later there ar several predictors. So here again, it depends what definition you give exactly to ANOVA.
If by ANOVA you mean « variance [sum of square] decomposition in a sum of terms » and the corresponding ANOVA table with F-tests for sum of squares, then definitly yes, ANOVA can be performed on data where all variables are continuous.
If by ANOVA you mean the very special case of one-way ANOVA, two-way ANOVA and more generally to cases were all « IV » (I prefer « predictors ») are qualitative, factors, then by definition it does not apply to cases where at least one predictor is quantitative, a fortiori continuous, where you enter the ANCOVA field.
Except that terminology point, I still don't understand what would be special in the ANOVA cases, except the examples I gave previously (test for homoskedasticity typically, like Bartlett's test or Levene's test) because having replicates for all combinations of predictors allows to go further, but that's more an experimental design feature than an ANOVA vs linear regression feature in my mind.
Ronan, t-tests are special cases of ANOVA (two groups vs 2+groups). And ANOVA is difficult to interpret without doing pairwise t-tests to determine just where the significant signal is.
Furthermore both tests can be done with OLS (least squares) but in any case they are better done with maximum likelihood tests or generalized linear models so they most certainly are not archaic and fit in nicely with all other generalized linear models. They are simply special cases of a large group of generalized linear models.
The proper models and the links depend on the data type. t-tests and ANOVA have continuous response variables (y) and categorical explanatory independent variables (x) and are usually normally distributed.
Logistic regression has categorical responses (dichotomous or polytomous) with continuous explanatory variables and a often a poisson link. Log linear models have categorical response and categorical explanatory variables with a Poisson, binomial or multinomial link. Mixed models may have categorical and continuous variables and the distribution and link will depend on the nature of the response variable.
All are generalized linear models with different distributions and different link functions and there are more GLMs (e.g.gamma and beta and chi-square) which are slo GLMs.
The methods for inferring y using maximum likelihood are parallel in methods and differ only in distribution and link functions.
And getting back to Mehmet's original question, he is right that t- tests and ANOVA have no special distinction among statisticians. They are both generalized linear models with categorical response variables and continousl explanatory variables. (ANCOVA adds categorical explanatory variables).
Lately I have noticed a provoking title in a stats textbook "Statistical Methods in Medical Research" by Armitage, Berry and Matthews. First section at chapter 11 of this book is entitled "Analysis of variance applied to regression"! They discuss regression first without any emphasis on ANOVA. Later they talk about ANOVA and at last they apply ANOVA to regression. Anyway, this is not related to the core of my question in this series of posts.
Interesting that you find this "provoking".
ANOVA can only be applied to regression. The underlying functional structure is a regression model: modelling the expected values of a response depending on the values of one or more predictors. If these predictors are categorical, the modelled expected values represent group means. That's what it is all about. Now you may analyze the residual variance under certain restrictions of the model (=ANOVA).
And even this is a special case of the analysis of deviance, which works on the likelihoods directly. In the case that the probability model used to get the likelihoods is the normal distribution, the deviance equals the variance (in this special case the likelihoods can be expessed as a sum of squares). The tests used to get the statistical significance in the difference of the deviance between (nested) models require to know the smapling distribution of the likelihood ratios. These are usually not known, but Wilk's theorem says that it is approximately Chi² with as many df as there are restricted coefficients (-> general approximate liklihood ratio tests). Again the special case of using the normal probability model allows us even to identify an exact sampling distribution, which is the F-distribution (the F-tests ["ANOVA"] are exact likelihood ratio tests, given the normal probability model). For one nominator df, this can be expressed as the square of the t-distribution (the t-tests are F-tests on 1 nominator df).
What is worrying in all this discussion is the emphasis on "tests", to the exclusion of a more complete practical statistical analysis.You should not be reducing everything to an AOV table. There are important questions of model-checking, etc., often best approached graphically, that may be best understood from a regression-based view-point. Then there is the question of what to do if the standard assumptions don't apply. You may have to develop an entirely new analysis if the standard non-parametric/distribution-free scenarios can't be applied ... you are more likely be able to do this starting from regression-based thinking.
Jochen, I said "provoking" to emphasize and support my previous opinion that old regression was different than ANOVA (so they used to have different focus and tools). You can consider this as a defense to the opposing ideas claimed above (ie. Regression and ANOVA are the same).
Coming to model comparison and other issues, I am not sure if Fisher or Galton had any ideas about model comparing when they invented ANOVA and regression. There are some modern interpretations as you and others indicated, I follow them. But they are not directly relevant to my main point and I consider them modern and post hoc.
I think that the jargon for a 2-group test will persist despite the fact that it really is just an another ANOVA which is, in turn, another regression analysis. It is embedded in the literature and especially in software.
As a statistician, I am quite familiar with medicine having worked on coronary artery disease and taught the subject to physicians. I note that along with other areas of biology, some researchers have a rather too elevated opinion of their statistical expertise and do not take advice from a mere statistician with much good humor. Hence there is a lot of false hubris out there.
By the way, Fisher did indeed understand model comparison having introduced maximum likelihood techniques early in the 20th century. I was the student of a student of Fishers (Simon Tavare). Until, I was a student in the 1980s, doing multidimensional linear algebra and hence statistical comparison of models was not practical on a computer. Later we could our model comparisonwith Matlab and interactive iteration. Soon along come GLIM that was a Godsend because we only needed to input our data and GLIM provided us with a large choice of generalized linear models and link functions that made it very clear to all of us that all were regression analysis.
Annoyingly these methods have been around for a very long time so they are not modern and are used in all modern statistical software. Only the old labels persist.
I agree with you all. General is more efficient than particular. No discussion.
My question is how many time did you (we) spent to arrive at this conclusion? how many time did we spend to understand so many statistical concepts?
Is it possible for a sociology or health sciences students and many others to get up the understanding of all this in a 16-week term? Before walking and running it is good to crawl
Many efforts must be made in this direction. How can we achieve it?
Current software can help. Less mathematics, more statistical concepts, and more software use in class.
Jorge Ortiz
My experience with students/grad students/postdocs/faculty learning only the software with no understanding of the underlying theory is that they cannot choose the proper test for their data, they cannot readily interpret the results and they cannot understand the goodness of fit of the analysis to their data.
They have two options after admitting that 16 weeks gives only a cursory understanding of statistics. They look to statisticians in their fields to advise them and guide them through the process or they bite the bullet and learn mathematical statistics so that they understand all the various distributions available for different data and they learn diagnostics.
"My experience with students/grad students/postdocs/faculty learning only the software with no understanding of the underlying theory is that they cannot choose the proper test for their data, [...]"
Sure. But even worse is the fact that they can not even plan and design their experiments in a proper way! The hard work with statistics is in the planning phase. Data analysis is the least problem, especially if you already designed the experiment well.
Exactly, Jochen. Do the statistics first. The problem I have had with statisticians is that many are not interested in planning. There are exceptions, but the teaching of experimentation has become too compartmentalized and in some disciplines, too cookbook.
Does the scope of the question include multiple group comparisons (either planned or post hoc)? What about small sample sizes? Data that isn't Normal (e.g., counts)? Or what about comparing something other than the mean (i.e., quantiles)? I think in this broader scope "two group" tests aren't obsolete.
I'll state (echoing Jochen Wilhelm and Joseph L Alvarez) here that statistical machismo is not a substitute for good experimental/study design.
Even then interesting comparisons and understanding thereof may best be captured by plots and pairwise comparisons. And with today's computational and open-source programming advantages, why not just dive right into pairwise comparisons and adjust P-values (i.e., Bonferroni, False Discovery Rate)?
Small sample sizes? Permute. Normality assumptions violated? Use nonparametric, permutation, randomization, etc. What if it's interesting to compare the 25th percentile (or any other that's interesting)? Harrell-Davis estimator.
I'm certain there are other techniques and methodologies one could use, my point is to tease at the original question a bit. My opinion is that two group comparison tests aren't obsolete, for the reasons I give and perhaps others I haven't considered.
T and Welch's statistics are there and researchers are free to use them when appropriate as they have optimality properties. The question is if it is convenient to include them as part of syllabus or not. In my opinion, they offer the opportunity to teach in a simple situation the problems and concepts related to variances heterogeneity.
Jorge Ortiz
I will refer you to the following topic.
https://www.researchgate.net/post/Three_means_comparison_by_t_test_or_ANOVA
I'm going back to Mehmet's question about why teach t-tests when they are not really different from larger tests? For one, learning the t-test is a nice spare test that easily demonstrates all one needs to know about statistical thinking: What a confidence level really is and why it is mathematically equivalent to finding a p-value; that with relatively small samples it converges to normality (Central Limit Theorem), how to compute the power of the test, why iid (independent and identically distributed data) is crucial to getting results that reflect reality, sample size, variance.
Beyond that, ANOVA is an easy extension of the t-test. The t-test is a special case of ANOVA. A t-test uses Student's t-statistic (a ratio of a normal random variable and a chi-sq random variable) and it is the F-statistic with degrees of freedom (1, v).
And beyond that these two are special cases of the whole range of generalized linear models (linear regression).
The trouble is that keeping these in mind is difficult without training in the truly elegant field of mathematical statistics and an advanced degree. So again I suggest that those who do not wish to delve that deeply into the subject are bound to misunderstand distributions and statistical tests. So teachers, relying on intuition rather than knowledge, pass on misinformation to students.
The solution is for students, researchers and faculty to consult with a statistician when in doubt about experimental design, data and analysis. But that is not likely to happen in my lifetime. So analyses will continue to be faulty.
Excellent question and suggestion. However, even though if t-test is an anova, you cannot use it in place of t-test wherever you only have 2 groups of test, otherwise you stand to be criticized before an examiner or reviewer. T-test is t-test, and anova is anova. T-test, as a 2-group test, is not obsolete. It's common just as anova, a multi-group statistical test.
@ Kamoru: if any examiner or reviewer criticizes the use of an ANOVA instead of a Student's T-test for two groups, assuming you're not interested in an unilateral test, then this examiner/reviewer has no right to make any comment on any statistical issue.
Of course, for an unilateral test, the situation is different, and T-test is the only one that can be done logically, since ANOVA loose all signs.
Indeed there is no reason to do an ANOVA test with only 2 groups. A t-test is simpler. I cannot imagine that anyone would do an ANOVA when the simpler test is adequate. And I doubt that a statistician who is a reviewer would mind which you use as I wouldn't. But I might be puzzled why one would bother to do so.
No 2 group tests make sence, since the biostatistical variables, prerequisite and targets differ from those needed/obtained from analyses of > 2 groups.
(I haven't read all the above responses, but...)
I find t-tests (or the corresponding regression coefficient) useful for two reasons. In practice, I often have a directional hypothesis and as far as I know I can't test that with an ANOVA (whether it's a regular ANOVA, or an F-test on delta R squared in a regression model comparison). And for teaching, I find that t-tests seem easier for students to understand and calculate by hand than ANOVA, which can make them be a useful starting point.
ANOVA it´ s utilize for 3 or more samples, t-test is aplied only 2 samples.