Why is this GLM output giving p = 1?

01 January 1970 3 5K Report

The boxplot attached shows the proportion of presence-absences counts of species "GP" in different dune treatments (habitat types) with 3 replicates in each treatment, and between 2-4 years repeated measures (unfortunately I have an unbalanced design of AC = 4 years, CC = 3 years, DC = 2 years repeated measures).

One count is a dune-year, aggregated to dune treat across years and dunes.

From the boxplot it seems quite clear to me that the presence of GP should be significantly different for AC:CC and AC:DC if not for CC:DC as well, given they are 100%, 7% and 0% present in AC, CC, DC samples respectively.... .

I conducted binomial GLM on the presence absence (p/a) data using the formula in R:

glm(formula = p/a ~ dune.treat,

family = "binomial",

data = dat[dat$species == "GP",])

THE OUTPUT WAS:

Deviance Residuals:

Min 1Q Median 3Q Max

-0.78 -0.788 0.00005 0.00005 1.626

Coefficients:

Estimate Std. Error z value Pr(>|z|)

(Intercept) 20.57 3964.63 0.005 0.996

dune.typeC -21.58 3964.63 -0.005 0.996

dune.typeD -41.13 8253.04 -0.005 0.996

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 55.637 on 40 degrees of freedom

Residual deviance: 17.397 on 38 degrees of freedom

AIC: 23.397

This output seems very strange to me. I also tried to run a mixed effect model to include year as a random effect to take into account the unbalanced repeated measures.

glmmPQL(

p/a ~ dune.treat,

random = ~ 1 | year,

family= "binomial",

data = dat[dat$variable == "GP", ]

)

Output:

Linear mixed-effects model fit by maximum likelihood

AIC BIC logLik

NA NA NA

Random effects:

Formula: ~1 | year

(Intercept) Residual

StdDev: 1.460122 0.5028493

Variance function:

Structure: fixed weights

Formula: ~invwt

Fixed effects: value ~ dune.type

Value Std.Error DF t-value p-value

(Intercept) 30.87 274897.7 34 0.00011 0.9999

dune.typeC -32.23 274897.7 34 -0.00012 0.9999

dune.typeD -61.94 505281.0 34 -0.00012 0.9999

Correlation:

(Intr) dn.tyC

dune.typeC -1.000

dune.typeD -0.544 0.544

Standardized Within-Group Residuals:

Min Q1 Med Q3 Max

-2.42e+00 -4.89e-01 -7.890e-13 6.215e-12 2.96e+00

Number of Observations: 41

Number of Groups: 5

I realise there is a huge standard error, and large DF, which is probable leading to p-value close to 1. But is this the 'true' result?- I don't really understand why this is happening or what I can do about it?

Is there an error in the models I am running?

What would be a more appropriate model for this data to test for differences between dune.treats?

Thanks

Tania

Stephen Politzer-Ahles

You have two conditions with perfect separation (100% or 0%); those make it hard for the model to fit (it's impossible to estimate the variance within that condition when there is no variance in it). I'm not sure if there is another way to get around that (there may well be, I'm just not aware of it). My feeling would be, if you can already see that the values in those cells are 0% and 100%, what's the point of statistically testing them? You can't get any more information out of a p-value than you have from already observing that these samples are 100%. It would be possible instead to just test "CC" (if you want to see, e.g., whether it is significantly above or below 50%).

Tanweer Alam

This discussion might be helpful

http://stats.stackexchange.com/questions/183320/logistic-glm-with-good-predictors-is-giving-p-values-1

Salvatore S. Mangiafico

1. I have the suspicion that the p/a expression in your formula isn't doing what you think it is. See the attached file on logistic regression for an example using counts of Present and Absent.

2. The attached file on logistic regression mimics the simplified data in your plot. Note that the using anova shows a significant result for the effect of dune type.

3. If you want to stick with your simplified data, you could use Fisher exact test. The attached file on Fisher has the code for Fisher test and a post-hoc test. I think because you have fixed the number of samples for each dune type, you meet the fixed marginals assumption of the Fisher test.

How to download free version of BiofilmQ ?

Can I permeabilize cells when staining for lipids in the membrane?

Where would an undergrad start learning about bioinformatics?

Recrystallization of polymer failed ?

FT-IR: no OH peak visible ?

How can I confirm the formation of my polymers?

Some advices for purifying a polymer ?

What does Uniformity of drug (tablets or Capsule) means? And what can cause make the uniformity of tablet to be Out of approved specification?

Which model of hypoxia/reoxygenation in vitro to use ?

Which macrophages marker to use for IF ?

How to learn more about SPSS and its Application?

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

How to normalize and take the significance of the MTT OD values with 3 replicates for the same cell-line?

If we are using snowball sampling technique, how do we justify the true representativeness of the sample statistically? is there any statistical test?

How to report results of Generalised Linear Mixed Models in a journal article?

Why 3 replicates for most biological assays? Is it enough to examine the data fits normal distribution?

Posthoc test lettering in JAMOVI?

How to back transform the results generated from analyses using log transformed with In(X+1) data?

Which statistical test should we use?

All math can be explained by iterator of code?