How can I analyze variables with several zeros?

Which Scopus Journal provides the most affordable fees?

"PUBLISHING IN A SCOPUS JOURNAL" Researchers are now at a cross road. The critical need to publish in a Scopus or ISI, etc journal is ever vital. Journal Publication fees must be submitted....

10 August 2024 8,621 1 View

Seeking Advice on Viability and Execution of Undergraduate Thesis Topic?

Hello everyone, I am currently developing a thesis proposal and would appreciate your input on its viability and how to effectively carry it out. My proposed topic is: "Does the perceived threat...

10 August 2024 8,992 0 View

Request Python code?

Request Python code from this article : Gender equity of authorship in pulmonary medicine over the past decade. THANKS!

08 August 2024 6,242 2 View

Who will be moral responsible for the death of thousands of people in the event of an earthquake?

Who will bear moral responsibility for the deaths of thousands of people in the event of an earthquake? Weeks and months remain before the onset of strong earthquakes that bring death to...

08 August 2024 6,134 12 View

Why does everyone use vs code?

Visual Studio Code (VS Code) has become a popular choice among developers for several reasons: 1. **Free and Open Source**: VS Code is free to use and open source, making it accessible to...

07 August 2024 7,013 4 View

Is an invitation to join the editorial board of Clinical Cardiology Updates a scam?

I received an e-mail invitation to join the editorial board of Clinical Cardiology Updates. While I have published a few articles related to cardiovascular disease, there are lots of colleagues...

06 August 2024 8,981 8 View

Are there any instruments for studying time similar to the way it is in space?

There are a huge number of methods for studying objects in space, according to the senses (and not only). Mechanical, thermal, optical, acoustic, electrical, magnetic, based on particle beams,...

06 August 2024 7,102 0 View

How do soil microflora interact with plant roots and influence plant nutrition, health, and productivity?

06 August 2024 9,618 3 View

How can i do multivariate Time Series forecast using MLP, ANFIS and LSTM?

I need the python code to forecast what crop production will be in the next decade considering climate and crop production variables as seen in the attached.csv file.

05 August 2024 2,977 3 View

In the case of a wound l recurrence after radical breast cancer and sentinel lymph node biopsy. Are the sentinel lymph node procedure recommended?

In the case of a wound l recurrence after radical breast cancer and sentinel lymph node biopsy. Are the sentinel lymph node procedure recommended? If no axillary lymph node dissection was not...

05 August 2024 8,056 1 View

Priyathama Vellanki Popular answer

I agree with the explanation by Dr. Herraiz. Another alternative we have done is to add 1 to all the data points of that variable so you can perform a log transformation. For example if your data is 0,0,2,3,5, you can now make them 1,1,3,4,6. Or you can also perform non-parametric tests.

Noel Artiles-Leon

I do not know of any statistical method of analysis that requires that the variables do not take zeros as a value. As far as statistical methods for the analysis of continuous random variables, zero is no different than any other number.

Giacomo Tirabassi

Hi Angelo!

regarding variables with several zeros, they do not need necessarily non parametric tests. It only depends on the normal or non-normal distribution. The first thing you have to do is to perform a Shapiro Wilk test on these variables and check out if they are normally or not-normally distributed. Then you can choose between parametric and non-parametric tests, respectively.

In order to normalize data distribution, usually it is useful to transform variables into logarithmic ones. It helps to improve their degree of normality. However, in my experience, I have often encountered variable which were impossible to normalize, even by logarithmic transformation. In that case, there is no alternative than non-parametric tests to analyze them.

See you soon

Giacomo

Katarína Sebeková

Hi,

Logarithmic function is an inverse function of exponential function (and vice versa). since you can not raise any number to no power (=0), logarhitm of zero is not defined. thus you can not "linearize" non-normally distributed data including zeros using logarthitmic transformation, in this case probably the linearization using 4th root transformation could work. this may yield normally distributed data suitable for parametric tests.

otherwise both parametric and non-parametric tests accept zero values

katka

Angelo Cignarelli

Actually, i always perform Kolmogorov-Smirnov test to rule-out normality distribution of a continuous variable, and, of course, log-transformation of zero can not be performed.

I was wondering if, in the case of non-normal ditribution, non-parametric test could be performed with (possibly several) zero.

Thanks to all in helping me and clarify this issue.

Giovanni Bader

If the variable is going to be the dependent component of a model then you don't have to transfom. It all depends on the nature of what you are measuring. If you are measuring counts then you can use a poisson or negative binomial model.

On the other hand if it's an independent variable and the distribution is continuous the logarthmic or square root transformation is OK (according to where you have the skeweness)

But can you tell us what is the variable measuring actually ? and what is the hypothesis

It's a continuous variable (energy expenditure) which often is zero (almost 70% of cases) otherwise around 1000 - 10000 kcal.

then the above anserws regarding log transformation are appropiate

Mann Whitney for two groups and Kruskal Wallis for three. The non parametric tests work on ranks so they are ok

As an alternative, similar to ranks, is to generate groups, like < 10, 10-100, 100-1000 etc. And then use the chi2. The drawback of this approach is that you may end up with different N in each class. Xtile (percentiles) could be a more advisable solution. Then you compare quartiles, or tertiles

I did not realize at the first time that your variables were "0". I thought they contained many zeros as decimals (e.g. 0,0000009). Obviously, in case of "0", log trasformation cannot be applied.

Bye

Monica Mazariegos

Hi Angelo,if is a count variable you can use the zero-inflated poisson regression as this type of poisson regresion is used to model count data that has an excess of zero counts.

Carlos Guijarro Herraiz

As an alternative, a variable equal to zero may be considered as essentially equal to a very low value. E.g. increase in body wieght as 0 kg is essentially equivalent to increase in 1 gram. According to the parameter, this minimal irrelevant value may be substitued for zero allowing for logarithmic transformation.

Priyathama Vellanki

Mehmet Guven Gunver

NEVER try to "convert" it to normal. check our new paper

Article TO DETERMINE SKEWNESS, MEAN AND DEVIATION WITH A NEW APPROAC...

How do you analyze variables with several ones?

read our paper. there is one data stack which contains several zeros in proof of concept section

Fabio Favoretto

I suggest a nice book that also cover this problem https://www.springer.com/la/book/9780387874579

you are dealing with zero-inflated data, (meaning the response variable contains many zeros) in the book are discussed 4 models that can deal with excessive numbers of zeros: 1) zero-inflated Poisson (ZIP); 2) zero-inflated Negative Binomial (ZINB); 3)zero-altered Poisson (ZAP); 4) Zero altered Negative Binomial (ZANB).

I strongly suggest the book because it is well written and also provide R code to do the analysis. If you can't have access to the book i am sure you can find information over the internet over these models and their applications. Trasforming data should be the last resort.