Are non-parametric tools sufficient to handle properly these kind of data? Is there any transformation I could apply in order to "normalize" distribution?
I agree with the explanation by Dr. Herraiz. Another alternative we have done is to add 1 to all the data points of that variable so you can perform a log transformation. For example if your data is 0,0,2,3,5, you can now make them 1,1,3,4,6. Or you can also perform non-parametric tests.
I do not know of any statistical method of analysis that requires that the variables do not take zeros as a value. As far as statistical methods for the analysis of continuous random variables, zero is no different than any other number.
regarding variables with several zeros, they do not need necessarily non parametric tests. It only depends on the normal or non-normal distribution. The first thing you have to do is to perform a Shapiro Wilk test on these variables and check out if they are normally or not-normally distributed. Then you can choose between parametric and non-parametric tests, respectively.
In order to normalize data distribution, usually it is useful to transform variables into logarithmic ones. It helps to improve their degree of normality. However, in my experience, I have often encountered variable which were impossible to normalize, even by logarithmic transformation. In that case, there is no alternative than non-parametric tests to analyze them.
Logarithmic function is an inverse function of exponential function (and vice versa). since you can not raise any number to no power (=0), logarhitm of zero is not defined. thus you can not "linearize" non-normally distributed data including zeros using logarthitmic transformation, in this case probably the linearization using 4th root transformation could work. this may yield normally distributed data suitable for parametric tests.
otherwise both parametric and non-parametric tests accept zero values
Actually, i always perform Kolmogorov-Smirnov test to rule-out normality distribution of a continuous variable, and, of course, log-transformation of zero can not be performed.
I was wondering if, in the case of non-normal ditribution, non-parametric test could be performed with (possibly several) zero.
Thanks to all in helping me and clarify this issue.
If the variable is going to be the dependent component of a model then you don't have to transfom. It all depends on the nature of what you are measuring. If you are measuring counts then you can use a poisson or negative binomial model.
On the other hand if it's an independent variable and the distribution is continuous the logarthmic or square root transformation is OK (according to where you have the skeweness)
But can you tell us what is the variable measuring actually ? and what is the hypothesis
then the above anserws regarding log transformation are appropiate
Mann Whitney for two groups and Kruskal Wallis for three. The non parametric tests work on ranks so they are ok
As an alternative, similar to ranks, is to generate groups, like < 10, 10-100, 100-1000 etc. And then use the chi2. The drawback of this approach is that you may end up with different N in each class. Xtile (percentiles) could be a more advisable solution. Then you compare quartiles, or tertiles
I did not realize at the first time that your variables were "0". I thought they contained many zeros as decimals (e.g. 0,0000009). Obviously, in case of "0", log trasformation cannot be applied.
Hi Angelo,if is a count variable you can use the zero-inflated poisson regression as this type of poisson regresion is used to model count data that has an excess of zero counts.
As an alternative, a variable equal to zero may be considered as essentially equal to a very low value. E.g. increase in body wieght as 0 kg is essentially equivalent to increase in 1 gram. According to the parameter, this minimal irrelevant value may be substitued for zero allowing for logarithmic transformation.
I agree with the explanation by Dr. Herraiz. Another alternative we have done is to add 1 to all the data points of that variable so you can perform a log transformation. For example if your data is 0,0,2,3,5, you can now make them 1,1,3,4,6. Or you can also perform non-parametric tests.
I suggest a nice book that also cover this problem https://www.springer.com/la/book/9780387874579
you are dealing with zero-inflated data, (meaning the response variable contains many zeros) in the book are discussed 4 models that can deal with excessive numbers of zeros: 1) zero-inflated Poisson (ZIP); 2) zero-inflated Negative Binomial (ZINB); 3)zero-altered Poisson (ZAP); 4) Zero altered Negative Binomial (ZANB).
I strongly suggest the book because it is well written and also provide R code to do the analysis. If you can't have access to the book i am sure you can find information over the internet over these models and their applications. Trasforming data should be the last resort.