Can I use parametric tests (e.g. ANOVA) if my variable is a percentage (discrete)?

08 August 2013 31 7K Report

Our data is from behavioral tests in which subjects succeed X trials of all the trials (Y,

Fredrik Nilsson Popular answer

Yes, you're quite right. It's a discrete variable a binomial variable. Why not use the generalized linear model, GLM, which extends the linear (or Gaussian) model to other exponential distributions (of which the binomial is a member). There are mixed GLMs, to account for repeated measurements (if that actually is a problem). Or GEEs. But I'd suggest a GLM with binomial response (and logistic link), then you use the inverse logistic transform to get the mean (expected proportion). If really necessary with random effect then do a mixed. http://www.openbugs.info/Examples/Seeds.html and http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3112198/

Fredrik Nilsson

Jochen Wilhelm

Your response is binomial, so you should use a binomial model (GLM of binomial family). Beta regression is not so recommended here because you do know the denominators of the fractions (number of trials). You should use this knowledge.

Federico Morelli

I agree with precedent posts. If your expected results are "Success" = 1 / "Unsuccess" = 0, you might use the Generalized Linear Models, robust procedure to in order study binomial variable response as this. In this case, you set response variable as success (1,0) and predictors or regressors as your treatments. So apply a logistic regression....to test the significance of regressors to explain the behavior of response variable. Furthermore you can study the relative importance of regressors by mean of hierarchical partitioning analysis, MuMIN package, in R (free software).

Another way (if you have several test to each individual), you can use the percentage of success and apply an ANOVA test (if parametric) or Kruskal wallis (if you decide for a non parametric approach).

Harish G

Yes you can use after suitable transformation

Zsolt K Bali

Thank you for your advices.

Baoping Li

Transformation is not an ideal approach to such data, though it was so. Now, generalized linear model is effective and really suitable to deal with such data.

Jochen Wilhelm

Yes, it is a bad idea to transform the data. However, Generalized linear models also work with a transformation - but they transform the expected value.

Paolo Mongillo

In order to use generalized linear models you need to know what link function to use. Your data is not binomial if you use the percentage of sucessfull responses, but it is binomial if you use the response of each trial as your response variable. In such case you will have to include the subject as a random (repeated) factor in the model, as well as the other explanatory variables, whichever they are. This woud be the most appropriate approach. However, take a look at this paper: https://www.researchgate.net/publication/236081213_Spatial_reversal_learning_is_impaired_by_age_in_pet_dogs in which I've been using ANOVA and my variable was quite similar to yours (errors committed in 15 trials by the subjects).

Article Spatial reversal learning is impaired by age in pet dogs

J. Patrick Kelley

Paolo, I'm quite comfortable with binomial GLM, but could you clarify your comment? I think this might confuse future readers when you say that the binomial response is not a percentage of successful responses. It still can be coded as # successful versus # failures, assuming a reference level (as in the case of conducting a meta-analysis of annual survival rate or something like that). I'm most likely misunderstanding you, but maybe it would be clearer to restate?

I agree that you want to treat each trial within each individual as "independent" (i.e. as an individual row of data) and then nest those trials (or 0 and 1 values) under each subject. No argument there. Another thing about specifying the correct error family and link function is to make sure that the boundary conditions of the original distributions are met (positive continuous, 0-1, negative skew).

Speaking of link functions, you may want to experiment with both logit and complementary log-log link functions for the binomial error family. I've had great luck with the latter, especially in cases where the successes far outnumbered the failures.

Also, I agree with everyone that transformation is very very bad idea. Don't do it!

Paolo Mongillo

Hi Patrick, my point is that if one uses the subject's response in every single trial as a response variable (i.e. one trial = one case) then it is clearly binomial (1 = correct, 0 = wrong response). But if one uses as response variable the percentage of correct responses over total responses (i.e. one subject = one case) then it is not binomial - it's a ratio - or am I getting confused?

Daniel Voyer

I generally hold the same viewpoint as Paolo. Also, I've been using mixed models for a few years now and I've discovered that it is not the solution to all our statistical woes. In fact, I have some concerns about logit mixed models with repeated measures. Mixed models have pretty stringent assumptions with repeated measures designs and it is better to start with an unstructured variance-covariance matrix. This is the default in HLM 7, but I believe that it has to be specified with other programs.

João Maroco

Yes, you can. You may wanna do some transformations (e.g. (Log(x+1) to meet assumptions (e.g. normality specially if you have small n). Or you can use either Logistic and/or binomial regression if you have the original data (but if you only have %... than ANOVA is the way to go)

Aimilia Lydia Kalafateli

On that note, I'm struggling to find a good test to evaluate my WB data. Overall, I have four treatment groups (one control and three treatment), one protein of interest and one loading control per blot and overall 4 different blots as I had 32 samples to run in total. After measuring density for each band, I normalize to the control protein (ratio of POI/RP). I want to analyze the difference of the relative protein levels between my treatment groups. My thought is that if I have to evaluate relative values, I can't use parametric tests like ANOVA as I can't assume normal distribution (am i right?). If this is correct, what other tests can I use?

Jochen Wilhelm I count on your valuable input, your explanations have saved me for my PCR data analysis!

Thank you all in advance!

Jochen Wilhelm

If you use the logarithms of the POI/RP ratios, you should be fine.* The log density ratios in WBs are what dCt values are in rtPCR. You can use Dunnett's MCP to compare each treatment to the control. If you want to make all pairwise comparisons, you can use Tukey's HSD.

---

*If you, for some reason, think there are considerable systematich differences in the (log) ratios between different blots, it might possibly be worth to use a mixed model for the analysis (using "blot" as a random effect).

Aimilia Lydia Kalafateli

Jochen Wilhelm thank you for the immediate reply!

I have some minor follow up questions that would be very helpful if you could answer:

- Do the logarithms of the ratios follow normal distribution as a result of their logarithm being normally distributed (just for my own understanding)?

- I was thinking of using a two way ANOVA for the factors of treatment and blot, but if the blots are treated the same and that they don't influence the results, then I can use a one way ANOVA with only treatment as a factor and test for multiple comparisons in either case. Would you agree?

- Would it be a right approach to test for normality of my ratio values and check if this assumption is violated or confirmed and then test with a parametric test if confirmed?

Thank you once more for the help!

Lydia

Md. Moyazzem Hossain

You may use generalized linear model.

Jochen Wilhelm

1) If X and Y are lognormal distributed, then ln(X) and ln(Y) are normal distributed, and then ln(X) - ln(Y) is also normal distributed. Note that ln(X) - ln(Y) = ln(X/Y).

2) I think yes. If "blot" does not impact the results, then there is no need to condier "blot" in the analysis. Actually, the impact of "blot" should be (more or less) cancel out when normalizing to a reference within the same blot. But still, sometimes the AB does not work equally well on all blots, what would cause issues. If you have all proteins and samples on each blot, using blot as fixed factor can correct for this. Otherwise blot can be modelled as a random factor.

3) Actually, a p-value for the treatment effect would be invalidated if you use the data first to determine which test to use (common practice seems to thoroughly ignore this). So if you have no other experience or theoretical reasons what distributional assumptions would make sense, you can check it using one data set, but then calculate a p-value using a different data set. And with "check it" I don't mean formal significance tests, like Shapiro-Wilk, Kolmogorov-Smirnov, Levene etc. I rather mean to really look at the data, at residual diagnostic plots, and get an idea if the observed frequency distribution is really (strongly, considerably) flouting your assumptions about an idealized probability distribution model.

Aimilia Lydia Kalafateli

Jochen Wilhelm thank you again for your time, very helpful input!

Fredrik Nilsson

I fully agree with Jochen Wilhelm , don't do any distributional tests; they are ill-posed for the question at hand. A fairly easy alternative would be to do Kruskal-Wallis on the baseline-corrected values. If this does not give "similar" answers to your analysis then your distributional assumptions obviously have a strong impact on the analysis. If it does, fine!

Abolfazl Ghoodjani

In this Context, you can use Proportion tests in several groups.

Pradip KUMAR Sahu

Transformation of data is one option to make the same closer and closer to obeying normality assumption. Nonparametric tests are good under certain assumptions/condition moreover robustness of the is always to be looked in to under NP setup.

Gourav Kumar Vani

ANOVA is compatible with ratio and interval scale data so you can do so.

Aimilia Lydia Kalafateli

Hi again everyone and thank you for the answers.

Regarding the WB data I referred to in my previous post, after QQ plotting them I saw that they follow a bimodal-multimodal distribution and so do their logarithmic values. Does that mean that logarithmic transformation towards normality is only applicable on skewed data?

On that in mind, maybe the best way to go about analyzing them would be either a non parametric test or general linear models?

Thank you once more for your contributions!

Jochen Wilhelm

A non-unimodal ditribution indicates that you are mixing different things. This could be that your WB data comes from different groups that do have different means (different distributions). If this is the case, you should have a look at the residuals of a model used to describe the differences between the groups. My guess is that the points of the residuals from a model on the log-transformed data will be pretty close to a straight line. Google for "residual analysis".

Aimilia Lydia Kalafateli

Thanks Jochen, I indeed separated the data per treatment and the raw data follow normal distribution. Ratios do not (as expected), but also logarithms of the ratios do not follow normal distribution either. I attach a residual plot for the log-transformed data (if this is what you were referring to).

Should I also separate per blot? Then the data points will be only 3-4 per group, which I assume is a very small N to visually check for normality...

Jochen Wilhelm

The distribution of the residuals is reasonably symmetric. For sure it's hard to tell from such a small sample, but there is no obvious sign from that data that the assumption of normal-distributed residuals would be grossly wrong. So to my opinion it's ok to go on with that assumption.

Aimilia Lydia Kalafateli

So would you suggest to go on with parametric tests (ANOVA in that case) on the logarithms of the ratios? I also attach a QQ plot for the ln data here.

My backup plan was the Friedman test with Dunn's post hoc. I feel that this might be more on the safe side?

Thanks again!

Jochen Wilhelm

That's not the normal-QQ plot of the data you showed previousely. I got a different plot (see attachment).

Your backup plan is not well suited for such small sample sizes, and it is about different hypothses. Make up your mind what hypotheses you actually want to test, and then look for a way to test them. If there is no ideal way, you may use an approximation, and if this bears the risk of unsuited assumptions, one can well consider this during the interpretation/discussion.

If "being on the save side" means "I won't have a risk of getting nasty questions from reviewers (and neither I nor they are really interested in or have a clue what hypotheses are actually tested and how they are interpreted)" then your backup plan is likely ok.

Aimilia Lydia Kalafateli

Thanks Jochen, I might have sent the wrong data group for the QQ plot, possibly one of the four treatment groups, as I was testing multiple data sets in several different windows.

I have no intention to please myself or the reviewers, I would like to analyze the data as it should be, that is why so many questions :) Regarding the non-parametric tests, I've just seen it widely used when data don't follow normal distribution, especially in WB or similar. My hypothesis is always the same, that the intensity of my band signals is different between the four treatments and I just would like to test this appropriately!

Thanks again for your time!

Pradeep Mishra

I thing using data transformation and multi location or multiyear trial analysis ,this can be done to analysis.

Badges
Science topic

Similar topics
Mathematics
Statistics

More Zsolt K Bali's questions See All

Do you have experiences with ELISA kits from the following manufacturers: abbexa, MyBioSource, Signalway?

I am looking for sandwich ELISA kit specifically for the quantification of alpha7 nicotinic acetylcholine receptor proteins (CHRNA7) in rat brain tissue homogenates. I've found only these...

03 April 2019 904 3 View

How can I apply a Kruskal-Wallis or a Friedman test for 2 or more between- / within-subject factors?

We repeatedly faced a problem when analyzing data in behavioral studies. In more sophisticated experimental designs, we need to handle more factors (e.g., a pharmacological treatment and a...

02 March 2015 6,861 11 View

Can anyone recommend a low but effective dose of scopolamine and donepezil for rat in cognitive behavioral experiment?

I want to validate a T-maze experiment with scopolamine and donepezil.

01 February 2013 7,823 2 View

How to learn more about SPSS and its Application?

I would like to learn more about SPSS and Its application especially in regards to data analysis. Please suggest me how I can learn more about it. Thank you so much.

11 August 2024 9,101 4 View

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

I'm currently exploring the application of Python in textile engineering, specifically in areas like data analysis, process automation, and the development of smart textiles. I'm interested in...

10 August 2024 7,429 2 View

Is this a facetotecta nauplius?

This larva was captured using a plankton net in the Persian Gulf during the summer. I believe it may be a Facetotecta nauplius.

08 August 2024 3,746 4 View

How to isolate lymphocytes from mouse spleen?

I have tried several times to isolate lymphocytes from mouse spleen, but all attempts have been unsuccessful. I tried most available protocols. I used different dissociation media (HBSS with Ca...

04 August 2024 9,913 7 View

I need to dissolve stigmasterol for an animal experiment, it hasn't dissolved in either DMSO or vegetable oil. Can anyone recommend the best solvent?

Hello everyone, I need to dissolve stigmasterol for an animal experiment, but it hasn't dissolved in either DMSO or vegetable oil. Can anyone recommend the best solvent for stigmasterol?

01 August 2024 7,504 2 View

Posthoc test lettering in JAMOVI?

Does anyone know of a module for the JAMOVI software that is capable of generating mean separations using the classic letters based on post hoc results (e.g., Tukey test)? If, as I believe, such...

31 July 2024 3,333 4 View

How to back transform the results generated from analyses using log transformed with In(X+1) data?

I am conducting my analysis using SPSS. I log transformed my data using In(X+1) as my data contain zero values. However, when I want to back transform the regression coefficients generated from my...

31 July 2024 7,860 3 View

Have you tried using Vizly for your data analysis? Use the link: https://vizly.fyi/?via=olatomide. How do you see it?

AI has made it easier to code and analyze data

25 July 2024 9,861 1 View

Can we eliminate the stress singularity at the tip of the crack by manipulating the elastic constants?

The aim of the research here is to prevent the propagation of the crack in the fabricated elastic medium with useful applications.

25 July 2024 9,976 3 View

Is it appropriate for researcher(s) to collapse five or four rating Likert scales to three or two as the case maybe during data analysis?

Five or four rating Likert scales e.g. Strongly agree, agree, neutral, disagree and strongly disagree or Strongly agree, agree, disagree and strongly disagree are usually collapse to SA/A, N, D/SD...

24 July 2024 9,841 4 View