Is it possible to do Chi-square test for replicates?

Salvatore S. Mangiafico Popular answer

The traditional test of association for a stratified contingency table is the Cochran–Mantel–Haenszel test. In my opinion, there are a couple of drawbacks with this test. First is that, depending on the software implementation, it can somewhat tricky to be sure your three-way contingency table is set-up correctly to test the effect you want. Second is that the test does require an assumption that there is a homogeneity of odds ratios across strata, but this is easy to test. Finally, there are various tests including some combination of the names Cochran, Mantel, and Haenszel, and in different software it can be difficult to know which test is which. (You think I’m being silly, but I’m not.)

A more flexible approach is to use logistic regression. For this approach, you’ll need to make one variable explicitly the dependent variable. If the dependent variable has only two levels, then the logistic regression is pretty straightforward. For a stratified sample, you can think of the model as analogous to an anova with blocks. You can get a p-value for the effect of interest and one for the blocking variable. You can also add the interaction of these in to the model.

All that being said, to be sure any answer is appropriate for your situation, you'd have to explain the experimental setup with more specificity. For example, Lee Curley thought you had some kind of repeated measures or paired situation, which I didn't even consider... It's not clear, to me anyway, what is meant by "biological replicate" in your question.

Oyekola Oluyimika Oloyede

Hi, I will recommend

1. Principles of Biostatistics ( Authors-Marcello Pagano and Kimberlee Gauvreau

2. Medical Statistics at a Glance (Authors-Aviva Petrie and Caroline Sabin)

Best,

Oloyede

Lee Curley

Hi Mohamad, I would recommend the McNemar test, it is basically a chi-square for replicated data (although that is an oversimplification). Please find link to youtube video explaining test and how to do on SPSS: https://www.bing.com/videos/search?q=mcnemar+test&view=detail&mid=58BDF4904BC0E520B28E58BDF4904BC0E520B28E&FORM=VIRE

David Morse

Hello Mohamad,

It depends what question(s) you'd like to answer with these data. Not knowing that (or the variables involved), it's hard to give you a useful response. Perhaps there's someone local that you could converse with in greater detail to help sort out both the nature of your questions and of your data.

Chi-square contingency table analysis presumes: (a) the data are frequencies (counts, not measured scores); and (b) the cells are mutually exclusive (a case appears in one and only one cell); and (c) you have sufficiently large expected cell frequencies in each cell (to have confidence that the computed chi-square follows the theoretical distribution of chi-square well). The second condition (b) rules out replications. That doesn't mean, however, that you couldn't reconfigure the data to conform to this condition.

Good luck with your work.

Salvatore S. Mangiafico

Mohamad Al Kadi

Thank you all for your answers. I didn't explain because it is a little complicated but basically this is RNA-seq data comparing two conditions (two replicates for each). The first row is the number of matches in a specific site and the second row is the number of mismatches at the same site.

Lee Curley

Okay this helps. But what is your hypothesis? Will help to direct which test is more appropriate.

Salvatore S. Mangiafico

You'd still have to explain more about what is meant by replicate. Is replicate 1 in bile somehow the same replicate as replicate 1 in lb? Or would be be just as good to call them rep1, rep2, rep3, rep4 ?

Mohamad Al Kadi

Thank you so much for your followup. The mismatches happened randomly because of low accuracy. But it can be also a result of "base modification". The hypothesis is that the difference between the two conditions is real and not due to low accuracy. Actually more replicates will be better (at least three) but for reasons beyond my control, I had to settle. Replicates are independent. They are from four bacterial cultures grown in two conditions (two with and two without bile).

Salvatore S. Mangiafico

So, Rep1 Bile isn't in any way the same as Rep1 Without. They are from separate dishes. If that's the case, my recommendation would be to label them Dish1, Dish2, Dish3, Dish4 to avoid confusion. And then you can pretty much ignore Rep. If this is the case, the CMH test I mentioned won't be applicable, but a logistic regression will be. Each dish is a separate observation, but you don't need Rep or Dish in your model. Later I'll try to add what I get for results below.

Salvatore S. Mangiafico

Below is my approach to the problem. The code is in R, and can be run in R or at https://rdrr.io/snippets/ . Unfortunately ResearchGate loses some of the the formatting. The results here are indicated in comments with #. In the logistic regression output, some useful information is the p value for Condition, the proportions and confidence intervals for those proportions, and the odds ratio. Below that output, there is just a summary table of counts and proportions for reference.

if(!require(car)){install.packages("car")}

if(!require(emmeans)){install.packages("emmeans")}

Data = read.table(header=T, text="

Dish Condition Result Count

1 Bile M 55

1 Bile S 5

2 LB M 30

2 LB S 20

3 Bile M 70

3 Bile S 3

4 LB M 88

4 LB S 33

model = glm(Result ~ Condition, data=Data, weights=Count, family=binomial())

library(car)

Anova(model, test="Wald")

### Analysis of Deviance Table (Type II tests)

###

### Df Chisq Pr(>Chisq)

### Condition 1 23.68 1.138e-06 ***

library(emmeans)

marginals = emmeans(model, ~ Condition, type="response")

marginals

### Condition prob SE df asymp.LCL asymp.UCL

### Bile 0.0602 0.0206 Inf 0.0304 0.116

### LB 0.3099 0.0354 Inf 0.2452 0.383

###

### Confidence level used: 0.95

### Intervals are back-transformed from the logit scale

pairs(marginals)

### contrast odds.ratio SE df z.ratio p.value

### Bile / LB 0.142 0.0571 Inf -4.866

How to learn more about SPSS and its Application?

Can I base on reverse DNA sequences to perform alignment, convert to amino acids and GenBank submission?

Baseline drift in HPLC? What causes this?

Text-Communication from the M1 Hand Area using BCI—and then there is Elon Musk?

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

How can I use the cif data obtained from rietveld refinement extracted via gsas2, for microstructural analysis using ETEX software?

How are iso-frequency contours plotted?

How to prepare the nanoparticle treated fungal sample for Environmental SEM analysis?

How to normalize and take the significance of the MTT OD values with 3 replicates for the same cell-line?

Why does my protein refolded to beta sheet during thermal denaturation analysis?