Good day,

I am currently writing my bachelor thesis which involves quite a bit of research and statistics.

Could anyone please help me understand how having data that is highly skewed and having Outliers/extremes in the data affects the performance of the following two tests:

-shapiro-wilk and kolmogorov smirnov tests of normality

-levene's test for assessing homogeneity of variance for 2+ groups.

Would removing the outliers/extremes and using logarithmic transformation on the data before these tests make them more accurate/inaccurate? Or would logarithmic transformation increase the chance of Type 1 error?

Throughout my previous statistical courses, 99% of the time the problems were extremely simple and we could always assume that the data is normally distributed. When using t-test, we could simply use welch test with df when the variance could not be assumed equal. However, now that I am in a real world situation it is quite different. My analysis focus is on analyzing the current performance situation across multiple entities of the same organization and detecting where the biggest differences are, how big is the difference (CI for difference between means) and why (factors)/

For this of course, I need to look into non-parametric alternatives for ANOVA and t-test.

For ANOVA I have found that Kruskal-Walis test is the most used non-paremtric alternative to ANOVA. For post-hoc Dunn Bonferroni test, Mann-whitney U, games-howell's. - Mann-whitney U I can also use as a non-parametric t-test.

Looking forward to all the responses and suggestions.

Best regards,

Janis Frisfelds

More Janis Frisfelds's questions See All
Similar questions and discussions