What statistical tests to use for enzymatic assay data (non-normal distribution, over-dispersed; non-equal vaiance, heteroskedasticity)?

20 September 2023 5 8K Report

I have a question about what statistical tests I can do in R that would be most suited to analyse my enzymatic assay results.

My data

Dependent variable: Absorbance (its’ a colorimetric assay)

Factor 1: Protein (type of protein, wt and several mutants)

Factor 2: Substrate (several substrates)

For each assay, (e.g. Prot1 with Substr1) I have 3 – 6 data points (repeats), and it is not feasible to obtain more.

An example of my data would be something like fig1.

What I want to do:

1. Test if the different mutations have a different effect on the enzymes’ activity with the different substrates (basically test the dependence of Absorbance on the interaction between Protein and Substrate: Abs~Protein*Substrate)

Visually, if I plot my data on a bar chart (mean +/-SD), it appears to be the case, but I need to verify that what I see is significant.

Normally, I would do this with a two-way ANOVA, however:

my data is not normally distributed (according to Q-Q plot and skewness test, I have over-dispersed residuals (Laplace distribution), without any skew);

the variance is not homogenous (standardized residuals vs fitted values plot shows heteroskedasticity)

What sort of model could I use instead? Is there a way I can transform my data to allow a parametric test (the only transformations I found were against skewness, which I do not have)?

2. Test for each substrate, which mutations make the activity differ significantly from the wt (e.g. whether activity with Substr2 is significantly different for Prot2 and 3 from that for Prot1)

To avoid doing multiple pair-wise comparisons, I would normally do a pairwise.t.test with a Holm family-wise error rate correction.

For non-normally distributed data of non-equal variance, I saw it is recommended to do a Pairwise Wilcoxon rank sum test.

If I group my data by substrate, with one of the substrates (e.g. Substr2) it was normally distributed and of equal variance. I did the pairwise T test and it gave results consistent with what is seen on the graph. However, when I tried a Pairwise Wilcoxon rank sum test (holm correction) it showed no significant difference between any of the Proteins, which makes no sense (e.g. that there is no significant difference between Prot1 and 2, although one has an activity with the substrate and the other doesn’t). So it looks like the non-parametric test may not be powerful enough/ at all useful.

For the other substrates, either the distribution is not normal (again over-dispersed, Laplace distribution, with no skew), or the distribution is normal, but the variance is not equal (heteroskedasticity).

Just to note, one-way ANOVA or Kruskal-Wallis rank sum tests (where applicable) for Absorbance~Protein for each substrate individually, showed there is a significant variation in Absorbance depending on Protein.

What sort of pairwise comparison tests can I do in these cases? Or, again, how can I transform my data to be able to use a pairwise.t.test?

Thank you in advance!

I really appreciate any advice!

Sal Mangiafico

If you're using Kruskal-Wallis, an appropriate post-hoc is Dunn (1964) test. In R, this is available in the FSA package and in the PMCMRplus package. There are options for adjustments for multiple tests. PMCMRplus package also has other options for post-hoc tests, functions beginning with "kwAllPairs". This package also has functions to compare treatments to a control.

I have some examples here: https://rcompanion.org/handbook/F_08.html

If you are using anova, fit the model with lm() and use emmeans for post-hoc tests. This is a powerful and flexible approach.

If you have many groups, you may want to use no p-value adjustments for multiple tests. It's really up to you to what extent it's important to you to protect the familywise error rate vs. not missing potentially significant comparisons.

Jochen Wilhelm

glm(Protein*Substrate, Data, family=quasipoisson())

lm(log(Protein)*Substrate, Data)

Katrine Aleksenko

Thank you very much Sal Mangiafico and Jochen Wilhelm for your recommendations! I am currently trying them out.

I have a question though, Jochen Wilhelm

The quasipoisson is used for overdispersed count variables, and my variable is not in counts (it's decimal numbers). Can I still use it?

yes, it works Ford rational variables, too.

The backdraw is that you dont get a proper likelihood so that model comparisons via information critera are compromised, but that is probably not relevant for you.

Roberto Molteni

Hi Katrine Aleksenko

You can try the model suggested above and I think the most useful one should be the one with transformation.

Good luck with … colorimetric assay.

Which method would be most appropriate to use for setting a threshold when analyzing canopy pictures?

What to do when a large percentage of heterogeneity in three-level metaanalysis is accounted for by level 1 variance?

Meta-analysis: When to choose between two-level and three-level model?

Why are my proteins much heavier than expected on NativePAGE gel?

Is it possible to reverse the effect of EDTA in whole blood samples before basophil activation?

How to convert Hazard Rate Ratio (HHR) and Relative Risk (RR) to standardized effect size (eg Cohen's d)?

CFA in AMOS - iteration limit reached, how to proceed?

Markov model. Overall survival + disease specific survival. How to transform cumulative probability of death to 3-month probability?

How do I get a copy of the questionnaires use in determining parental satisfaction upon disclosure of a diagnosis of Autism Spectrum Disorder?

Which reference wavelength for OPD ELISA substrate?

How to normalize and take the significance of the MTT OD values with 3 replicates for the same cell-line?

Why activated CAR-Jurkat cell could not kill targets?

Please explain how the plastic input value should be considered from the true stress-strain curve for the bilinear elastoplastic material model ?

"A Markov-like Model for Patient Progression"?

If we are using snowball sampling technique, how do we justify the true representativeness of the sample statistically? is there any statistical test?

How to report results of Generalised Linear Mixed Models in a journal article?

How to isolate lymphocytes from mouse spleen?

Repeated measures ANOVA, ANCOVA or Regression?

Is it necessary to covary exogenous constructs in a structural model?

Which test should be used to study association among demographic profile and awarness level?