I have a query regarding data transformation if anyone can provide any guidance please?
I was wondering if, generally, it is possible to transform a variable's raw data twice, using 2 different methods, for the purpose of 2 different tests? I will provide you with a little background to my study first. I have a variable for 'Adverse Childhood Experiences' containing 1 score per participant. N = 113; however, 65 of these are 0 values and 3 are missing data - which I believe is disrupting my data considerably. I understand that it is not advised to simply remove the cases that read 0 just because there are many (however, if you recommend otherwise please let me know if so and why).
Useful to note here is that this variable has a skewness of 1.943, and because of this, I have made the decision to transform it.
I am carrying out a path analysis with 1 IV, one DV and 2 mediators. In the first instance I am carrying out a t-test (IV - gender, DV - ACE score) and then in the second instance I am carrying out a linear regression (IV age, DV - ACE score), to understand whether age and gender need to be included in my path analysis as covariates. In order to meet the assumptions of the t-test (namely, normal distribution across both levels of the IV: male and female) I have transformed the raw ACE data this using Tukey's formula, which brought the skewness to < 1 for each IV level - great. But then when I go to carry out the linear regression, and aim to meet the assumption of approx. normal distribution of residuals, the assumption is not met on the Tukey transformed ACE data. I have carried out a number of other transformations on the raw ACE data and the only one where the residuals are normally distributed for the regression is through a Log10 transformation.
My question is this: am I able to carry out the t-test with the Tukey transformed variable data, and then the linear regression with the Log10 transformed data? Or is it the case that I need to use the same transformed data for each stage of the analysis (ie. both Tukey or both Log10 for t-test and linear regression and then the same onward path analyses?)
If it is the case that I will need to use the Log10 ACE data to go back and carry out the gender t-test, it is useful to note here that I have done this already and when inspecting the Log10 transformed ACE data across the gender variable descriptives table the results come out very strange - for example, N for males goes down from 15 to 6, and N for females goes down from 115 to 59, and there are outliers, where there were none in the Tukey transformed data descriptives, so it is confusing me a little.
Any guidance welcome!
Thank you