What is the best way to calculate Confidence levels for proportions in a 4*8 crosstab?

As always there are many possibilities. Here is a solution for cell counts with R and the function glm (generalized linear model), used to fit a log-linear model, i.e. a Poisson model with a log link (default value for family = poisson). In this example I have fitted a (saturated) model with interactions.

> ## dummy data

> set.seed(123)

> dfr dfr$y dfr

x1 x2 y

1 a A 9

2 b A 11

3 c A 18

4 d A 14

5 a B 21

6 b B 30

7 c B 25

8 d B 17

9 a C 21

10 b C 38

11 c C 35

12 d C 36

13 a D 41

14 b D 38

15 c D 51

16 d D 49

17 a E 50

18 b E 46

19 c E 45

20 d E 51

21 a F 55

22 b F 70

23 c F 61

24 d F 54

25 a G 61

26 b G 69

27 c G 70

28 d G 72

29 a H 88

30 b H 80

31 c H 86

32 d H 111

> ## contingency table

> (xtab

> ## log-linear model with glm

> fm1 summary(fm1)

Call:

glm(formula = y ~ x1 * x2, family = poisson, data = dfr)

Deviance Residuals:

[1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Coefficients:

Estimate Std. Error z value Pr(>|z|)

(Intercept) 2.19722 0.33333 6.592 4.35e-11 ***

x1b 0.20067 0.44947 0.446 0.6553

x1c 0.69315 0.40825 1.698 0.0895 .

x1d 0.44183 0.42725 1.034 0.3011

x2B 0.84730 0.39841 2.127 0.0334 *

x2C 0.84730 0.39841 2.127 0.0334 *

x2D 1.51635 0.36811 4.119 3.80e-05 ***

x2E 1.71480 0.36209 4.736 2.18e-06 ***

x2F 1.81011 0.35957 5.034 4.80e-07 ***

x2G 1.91365 0.35708 5.359 8.36e-08 ***

x2H 2.28011 0.34996 6.515 7.26e-11 ***

x1b:x2B 0.15600 0.53195 0.293 0.7693

x1c:x2B -0.51879 0.50427 -1.029 0.3036

x1d:x2B -0.65314 0.53757 -1.215 0.2244

x1b:x2C 0.39239 0.52531 0.747 0.4551

x1c:x2C -0.18232 0.49281 -0.370 0.7114

x1d:x2C 0.09716 0.50787 0.191 0.8483

x1b:x2D -0.27666 0.50272 -0.550 0.5821

x1c:x2D -0.47489 0.45898 -1.035 0.3008

x1d:x2D -0.26358 0.47680 -0.553 0.5804

x1b:x2E -0.28405 0.49372 -0.575 0.5651

x1c:x2E -0.79851 0.45704 -1.747 0.0806 .

x1d:x2E -0.42203 0.47133 -0.895 0.3706

x1b:x2F 0.04049 0.48424 0.084 0.9334

x1c:x2F -0.58961 0.44860 -1.314 0.1887

x1d:x2F -0.46018 0.46823 -0.983 0.3257

x1b:x2G -0.07744 0.48260 -0.160 0.8725

x1c:x2G -0.55553 0.44424 -1.251 0.2111

x1d:x2G -0.27604 0.46133 -0.598 0.5496

x1b:x2H -0.29598 0.47527 -0.623 0.5334

x1c:x2H -0.71614 0.43550 -1.644 0.1001

x1d:x2H -0.20964 0.45046 -0.465 0.6417

---

Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for poisson family taken to be 1)

Null deviance: 4.2058e+02 on 31 degrees of freedom

Residual deviance: -9.9920e-15 on 0 degrees of freedom

AIC: 241.32

Number of Fisher Scoring iterations: 3

> ## predicted values and 95% CI

> pred dfr$fit dfr$lo dfr$hi dfr

x1 x2 y fit lo hi

1 a A 9 9 4.682777 17.29743

2 b A 11 11 6.091736 19.86298

3 c A 18 18 11.340669 28.56974

4 d A 14 14 8.291454 23.63880

5 a B 21 21 13.692050 32.20847

6 b B 30 30 20.975435 42.90733

7 c B 25 25 16.892603 36.99844

8 d B 17 17 10.568137 27.34635

9 a C 21 21 13.692050 32.20847

10 b C 38 38 27.650178 52.22389

11 c C 35 35 25.129629 48.74724

12 d C 36 36 25.967669 49.90821

13 a D 41 41 30.188815 55.68287

14 b D 38 38 27.650178 52.22389

15 c D 51 51 38.759300 67.10647

16 d D 49 49 37.033403 64.83336

17 a E 50 50 37.895681 65.97058

18 b E 46 46 34.455036 61.41337

19 c E 45 45 33.598551 60.27046

20 d E 51 51 38.759300 67.10647

21 a F 55 55 42.226452 71.63756

22 b F 70 70 55.380660 88.47854

23 c F 61 61 47.461629 78.40017

24 d F 54 54 41.357833 70.50660

25 a G 61 61 47.461629 78.40017

26 b G 69 69 54.497250 87.36221

27 c G 70 70 55.380660 88.47854

28 d G 72 72 57.149915 90.70880

29 a H 88 88 71.407301 108.44830

30 b H 80 80 64.257084 99.59991

31 c H 86 86 69.616044 106.23988

32 d H 111 111 92.157147 133.69555

Renaud Lancelot

If you mean percentage of the grand total, simply divide the results (predictions) by the grand total N. Alternatively, you can use an offset term in the Poisson model (= log(N)) and predict for new data with N=1.

> t(xtabs(fit ~ x1 + x2, data=dfr) / sum(dfr$y))

x2 a b c d

A 0.005909389 0.007222587 0.011818779 0.009192383

B 0.013788575 0.019697965 0.016414970 0.011162180

C 0.013788575 0.024950755 0.022980959 0.023637557

D 0.026920552 0.024950755 0.033486540 0.032173342

E 0.032829941 0.030203546 0.029546947 0.033486540

F 0.036112935 0.045961917 0.040052528 0.035456336

G 0.040052528 0.045305318 0.045961917 0.047275115

H 0.057780696 0.052527905 0.056467498 0.072882469

or using an offset

> dfr$N fm2 New New$N pred dfr$fit dfr$lo dfr$hi

> t(xtabs(fit ~ x1 + x2, data=dfr))

x2 a b c d

A 0.005909389 0.007222587 0.011818779 0.009192383

B 0.013788575 0.019697965 0.016414970 0.011162180

C 0.013788575 0.024950755 0.022980959 0.023637557

D 0.026920552 0.024950755 0.033486540 0.032173342

E 0.032829941 0.030203546 0.029546947 0.033486540

F 0.036112935 0.045961917 0.040052528 0.035456336

G 0.040052528 0.045305318 0.045961917 0.047275115

H 0.057780696 0.052527905 0.056467498 0.072882469

> ## lower limit

> t(xtabs(lo ~ x1 + x2, data=dfr))

x2 a b c d

A 0.003074706 0.003999826 0.007446270 0.005444159

B 0.008990184 0.013772446 0.011091663 0.006939026

C 0.008990184 0.018155074 0.016500085 0.017050341

D 0.019821940 0.018155074 0.025449311 0.024316089

E 0.024882259 0.022623136 0.022060769 0.025449311

F 0.027725838 0.036362876 0.031163250 0.027155504

G 0.031163250 0.035782830 0.036362876 0.037524567

H 0.046885949 0.042191125 0.045709812 0.060510273

> ## upper limit

> t(xtabs(lo ~ x1 + x2, data=dfr))

x2 a b c d

A 0.003074706 0.003999826 0.007446270 0.005444159

B 0.008990184 0.013772446 0.011091663 0.006939026

C 0.008990184 0.018155074 0.016500085 0.017050341

D 0.019821940 0.018155074 0.025449311 0.024316089

E 0.024882259 0.022623136 0.022060769 0.025449311

F 0.027725838 0.036362876 0.031163250 0.027155504

G 0.031163250 0.035782830 0.036362876 0.037524567

H 0.046885949 0.042191125 0.045709812 0.060510273

Not Missing at Random Structural Equation Modeling?

Why do we gain degrees of freedom when we add constraints? I know that i decrease the number of free parameters, but what does it mean conceptually?

Cohen's d in mixed linear models using interventional data with missing values?

Is it appropriate to use TYPE=complex function in MPlus when only some data is clustered?

How to learn more about SPSS and its Application?

How to normalize and take the significance of the MTT OD values with 3 replicates for the same cell-line?

If we are using snowball sampling technique, how do we justify the true representativeness of the sample statistically? is there any statistical test?

How to report results of Generalised Linear Mixed Models in a journal article?

Why 3 replicates for most biological assays? Is it enough to examine the data fits normal distribution?

Posthoc test lettering in JAMOVI?

How to back transform the results generated from analyses using log transformed with In(X+1) data?

How to change the version of the article full-text pdf file?

Which statistical test should we use?

Entropy measure and QSPR modeling in Graph Theor. How to construct the table for lengthy equation?