Is it necessary that in case control study, demographics (gender, age) should be equally distributed among case and control groups? In my study there is signifincant difference of age and gender between control and case groups.
No, it is not neccesary, given the groups are both random samples.
It even is irrelevant if the covariables are appropriately considered in the model. Here, an unbalanced design may give a suboptimal presicion, especially for the coefficients for the rare combinations - but this may still be good enough for the purpose.
I want to ask one question more, Reviewer of a journal suggest me to change your study design to nested case control study. But I have significant difference of age and gender among control and case groups? Is it ok or I have to take some measures?
You can run a matched case-control study, to take care of demographics. The selection criteria must be revised and created for matching.
It is not advisable to have significant differences in some demographics, especially in age or gender given the possible differences in outcomes. E.g. older subject react differently than younger, which lead to incorrect risk assessment of the study. Similarly with men and women. When homogeneity is broken it is difficult to show causality! The result can be well biased.
in an observational study you never ever show causality. This cannot be the problem or concern here.
But you also point to "incorrect risk assessment" when the sample demographics are different and the risk in subgroups may be different. I think that just in such situation matching is a golden way to get useless risk estimates.* You will get an weighted average estimate (weight factors being the proportions of subgroup-sizes). Consider this case: women show a relevant risk-increase, man show a relevant risk-reduction. When both effects are similar and the proportions of men and women are similar, matching will give "no (mean) effect". If you have more men than women, matching will give you risk-reduction. You may then be inclined to treat women similarily, what would be a very bad thing.
If the demographics of the sample resemble the demographics of a "population" (whatever this is), then the estimates for the matched analysis will estimate the "population"-risk correctly. If this is desired, than matching is fine. However, I think that a proper consideration of covariables in an appropriate model is typically closer to the scientific question.
-
*I am not an expert in such analysis and I am grateful for corrections on my point of view
I agree with you up to the point of risk. The risk in a case-control study, which is an analytical study, is estimated by the OR which gives direct estimates of the relative effect of exposure on the outcome. Based on the size of this measure of association we can build causal hypothesis, and from here, create other studies that confirm our initial hypothesis. Of course the best way to account for the causality is the cohort study, when the temporal sequence is used. We should consider case-control study as the starting point when you want to assess effects.The whole theory is much more complicated and I recommend the Modern Epidemiology, Rothman et.al., 3rd edition.
The homogeneity up to some extent must be present, otherwise the expected measure of effect is biased. I am taking it from medical perspective, when you create the retrospective study in question. Usually you pick up all (between some years) the patients charts from a hospital concerning the disease in question, and classify them into exposed or unexposed based on the retrospective analysis of the charts (you should have a hypothesis about a certain exposure which justify the whole study). Next, you will choose the control group from the referent population, by random, and group it into diseased vs non-diseased subjects, for which both dis. and non-dis will be surveyed/questioned for their exposure. Here comes the recall bias.
In choosing the controls it is important not to fall into a Bergson bias.
You evaluate the OR and you will assess its significance.
When you want to account for several prognostic factors, then you open the Pandora's box! Logistic regression is mostly used, the weighted one being known as the one that controls for confounders for example. Practically there is a need of propensity score and then the rest...
If you have any questions, please contact me and we can take it at the next level.
Is your case-control study nested in a cohort? Maybe the reviewer just want that you rename your study design?
Do you think that the difference of age and gender among control and case groups have an impact on your results? If it Is ok depends on your question and how you adjust for it. I don`t think that we can help you without further informations.
I also am having difficulty with the reviewer's suggestion related to a "nested" design because it is not obvious (from the discussion so far) what would be nested within what.
If that is indeed problematic, then I would frame the revision (and your description of it) around the need to deal with the systematic differences between the groups, and ignore the specific suggestion related to nested designs.
Besides of conditioning the answer to the kind of condition under study (what the hypothesis says and about what) I would adhere to some basic general concepts in clinical analytical research: one is -in this case- that cases are expected to be similar to controls in all aspects in as much as possible, except on being (cases) or not affected (controls) (controls) by the condition (disease) under study. Second, case-control design is highly prone to have biases. Let us think on a study of a water-borne infectious diseses: demographics may imply a risk of bias if a covariable is poorly -unsymmetrical- represented in either the case or the control group population (i.e; regarding their tap water supply). The third basic concept is that design biases cannot be controlled with statistical manipulations, but just those depending on chance.
"The third basic concept is that design biases cannot be controlled with statistical manipulations, but just those depending on chance."
What is "bias" here? A difference in the the covariables between cases and controls (like "tap water supply")? - If so, then I think that the word "bias" is wrong in this context. This is simply a difference in the group constitution. The estimates may still be statistically unbiased. However (and I think that this is what you actually meant), these estimates do not well reflect population parameters when the two groups are different (in the covariables) to the statistical population under study. But (important!) as long as the samples are drawn at random, then (even large) actual differences between the groups are random, too, and such differences will "average out" in the long run, meaning that the estimates remain unbiased.
"Bias" is a statistical property in the frequentist philosophy. It has a meaning only "in the long run".
Secondly I think that a proper statistical analysis can well control for differences between the groups (cases and controls). Estimates of effects can be statistically adjusted for covariated that may be different between cases and controls.
May be my example was misunderstood –something easily given the lack of more explanatory data of my unfortunate example. Yet, the question is made without any additional detail that could allow this exchange.
I believe we might agree with Joachim, provided we accept that this is a sort of Byzantine discussion, since we ignore the whole design: That is why I attempted an illustrative example that turned out to be misleading.
Let us move to a more drastic example of what I mean. If cases are from New York adults and controls are Peruvian children (as in this case, with no hypothesis at sight), isn´t necessary to know what the hypothesis is and what condition is to be studied that might explain why and how (and where) the case-control design will be carried out before answering the question?. What kind of statistical procedure would make valid the results and conclusions of such hypothetical design?.
Bias" is a well known concept -and there is a large list of their source elsewhere- that can deteriorate the study results -and conclusions- no matter what kind of "statistical modeling" or whatsoever is used.
Yes, Patricio. In your example you have a covariable (place of living, including all associated variables) that is perfectly correlated with the group. You are correct that in such a case there is no way to control of adjust for this. You will never be able to attribute the fact that a person is a "case" unambigiousely to the treatment/exposition (or to the "place of living").
The "problem" is gradual. The higher the correlation of a covariable with the groups, the lesser specific information you can use. Ideally the covariable is completely uncorrelated. In your case, the variable was perfectly correlated, leaving exactly zero useful information.