The first non-normally distributed group consist of 128 participants and the second normally distributed group of 15 participants. So I'm in fact comparing median with mean to get P value significance. Which test do you recommend??
I am not clear on what you are measuring? Are you trying to see if there is a correlation between two continous variables, or are you running a t-test? Please provide clarity on what the samples are, what you are trying to estimate, and what your hypothesis is. That will help determine what sort of test you should be using, and if a transformation is warranted.
thank you for your involvement in this question. I am trying to find out is there any statistically significant difference between two groups based on age. I have a group of 128 participants whose age is non-normally distributed based on K-S test (with median of 70 years), and the other group of 15 participants whose age is normally distributed (also K-S test) (with mean value of 59.6 years). I also calculated the variances of the two groups and they are unequal based on a two samples F-test so I ran the two samples T-test assuming unequal variances. Is this a viable solution regarding the fact that T-test is comparing two mean values, and as i mentioned above, the first group of 128 is non-normally distributed and thus has a median value.
Can you recommend me the most accurate test for my problem?
I suggest that you stick with a non-parametric solution such as a rank-sum test, and an even more robust test to consider in the non-parametric realm, is the Hodges-Lehmann aligned-ranks test. All software packages implement the rank-sum test, and fewer offer the aligned ranks method. I have written a program in Stata that performs this analysis, called alignedranks (from within Stata type: ssc install alignedranks). However, there is program in R that does this analysis as well.
In case you want to use a parametric alternative, I recommend the t'-Welch. Montilla and Kromrey (2010) found that: "the t`-Welch test is more robust than the t-Student test and this one, more than the Yuen test under the conditions of normality, absence of normality and heteroscedasticity"
2) How do you conclude that the distribution in the small group is normal?
3) Is the deviation from a normal distribution in tha large group *relevant* to your problem?
And besides all these questions a general note:
You seem to be unable to formulate a precise statistical hypothesis, and you also seem to be unable to specify a precice alternative hypothesis, so there is little basis in doing some hypothesis tests (parametric or not); the results won't help you to get any sensible or reasonable conclusion. Assuming you are interested in the expected difference in Age, then you should test this (rather than testing something else just because you know a test for something else!). If there is no test dealing with the assumptions you can or cannot make, then you can still bootstrap confidence intervals and p-values.
My suggestion is that you plot the data and that you try to make your conclusions based on what you see. If you are dying to get a confidence interval or p-value of the relevant statistic (like the expected difference, for instance) than bootstrap it. But don't use a test for some argitrary hypothesis only because this is the only thing you seem to be able to do, and also do not base your conclusions simply on a p-value (no matter how it was calculated and whether or not assumptions are met).
PEREIRA, B. de B. . Separate Families of Hypotheses. In: Peter Armitage; Theodore Calton. (Org.). ENCYCLOPEDIA OF BIOSTATISTICS. 2ed.Londres: Wiley, 2005, v. 7, p. 4881-4886.
or
PEREIRA, B. de B. . Tests for discriminating separate or non-nested models. In: Miodrag Lovric. (Org.). International Encyclopedia of Statistical Sciences. 1ed.New York: Springer, 2010, v. 1, p. 1592-15956.
Unlike you, I would prefer non-parametric approach: comparison of robust kernel estimates of probability distributions of both data samples. This would eliminate relying on a priori assumptions on statistical data models and could provide you with more information on differences between the variables. It may be that you are not used to this type of analysis. In such a case you could entrust to me your data sets (in Excel format). I would try to show you "what the data say for themselves".