Hello,

I have a dependant binary variable, let's call it Y. It is a yes/no.

I have an independant categorical variable, let's call it X. It comprises 8 categories, from 0 to 7. Each of these categories is a treatment of my experiment, and I expect the treatment, X, to have an effect on the decision, Y.

Each subject of my experiment was randomly assigned to one (and only one) of these 8 categories. The sample sizes are unequal.

I need a test that can allow me to see which treatment maximizes the proportion of "yes" - and I would also need to make a series of inter-treatments comparisons (maybe treatment 4 is more effective than treatment 7 but no more effective than all of the other treatments, for instance).

How would I show that a treatment allows to get a proportion of "yes" significantly higher than another treatment?

So far, people have adviced me to conduct a Chi-square test, which I did, but to my knowledge Chi² only allows me to see whether there is any difference at all, but it does not allow me to see what is better.

I could say for instance that there is a significant difference in proportion of Y = 1 between treatment 0 and treatment 1, but I am not allowed to say "since there is a significant difference, and the proportion of Y = 1 is higher for treament 1 than for treatment 0, then treatment 1 is significantly more effective than treatment 0 at maximizing the proportion of Y = 1", is that correct ?

If yes, then what test would help me ? I do believe that t-test are not an option since my DV is binary and my IV is categorical (so neither follows a normal distribution).

I am working on stata, and there is the possibility of doing "prtests", but they do not allow for more than 2 groups (where I have 8). And if that's a solution, I have no idea on how to program a k-group prtest (and I don't even know if that would be correct on a statistical point of view).

Last but not least, I have a very large sample (4000 subjects in total), and I have noticed that the Chi-square gives me very low p-value even when, intuitively, I would say there is no significant difference. Does the large size of my sample biases the results in some ways ? Should I use special tests designed specifically for this type of sample ?

I apologize for the length of my post, but I admit that I am not at ease with statistics in general, and I am really afraid of misinterpreting my data and giving "fake" results. I would be very grateful even for a small hint about one my interrogations. And if I did not give enough information, please do not hesitate to ask me for more details.

Thank you.

More Matthieu Plonquet's questions See All
Similar questions and discussions