I'm trying to compare the means of two groups, but the data does not follow a normal distribution, and the sample sizes are unequal. What test should I use instead of an independent samples t test?
The two sample t-test is one of the most used statistical procedures. The test assumes that the variable in question is normally distributed in the two groups. When this assumption is in doubt, the non-parametric Wilcoxon-Mann-Whitney (or rank sum ) test is sometimes suggested as an alternative to the t-test, which doesn't rely on distributional assumptions.
The Wilcoxon-Mann-Whitney (WMW) test consists of taking all the observations from the two groups and ranking them in order of size (ignoring group membership). The ranks of the observations from the first group (it doesn't matter which group you choose) are then summed, and the test statistic is formed as
U= R1 - n1(n1 +1) / 2
Under the null hypothesis that the distribution of the variable in question is identical (in the population) in the two groups, the sampling distribution of U
can be determined (or a normal approximation is invoked) and thus a p-value calculated. The test is available in most (if not all) statistical packages.
As already pointed out, it is not about the normal distribution of the whole dependent variable, but the normal distribution WITHIN each group. Therefore, this is not the same and may come to completely different results (in case of two differing group, I would assume a bimodal distribution for the overall DV, which is clearly not normal, but if measured within each, it can be perfectly normal).
If normality assumtion is not met, you should check why it is not normal. Is it because of some outlying values or is the distribution skewed etc? Outlying measures could be compensated by robust tests, like t-test versions for trimmed mean (loke for books by Rand R Wilcox for robust estimators), bootstrapping or bayesian t-tests (e.g. BEST R package by John Kruschke). Different distributional forms may also be handled with bootstrapping methods or robust bayesian t-test alternatives. All this methods have the advantage not to lose much information, since you still use the metric scale of your DV (and robust measures have a power advantage as compared to standard test, if assumptions are violated). The rank based option to do a WMW U test would only be my last resort, but it is still a viable way to analyze the data, but strictly speaking does it not compare mean values, but the distributional form of the variables, as pointed out above.
Once your data are parametric, whether the distribution is normal or not, the independent samples t-test is still appropriate but note that there are two assumptions in the use of t-test-equal variances and unequal variances. Equal variances assumed means that the sample sizes in each of the 2 sub-groups being compared and their standard deviations are NOT equal. This is the assumption of t-test use that should be used.
When the analysis is run on the SPSS package, the output will present results based on the two assumptions.....equal variances assumed AND unequal variances assumed. In this case, one has to keep to and report results under the latter.