If dependent variable is categorical( more than two categories) and independent variables are categorical ( two & three categories), is there a technique to find causal relationship between independent and dependent variables?
This is another way to help answer the question: "What came first; the chicken or the egg?"
From a data perspective, you could analyze the categories within the systemic context of the variables and look for semantic attraction, perhaps as emergent themes, or deeper association. In other words, model the variables and categories as a system. For starters, I would build a systems model for each variable-category context.
For such analysis, I would initially not enforce a causality rule, meaning I would refrain from applying an A implies B rule from the outset. Forcing the rule without sufficient evidential data would introduce risky bias in any functional relationship.
I'd suggest you generalize the variables as pure data, each within a standard context, and try and discover new meaning, as validity. To help inform the variables and context being considered, extend the systemic information if you have to in order to add more data richness.
More reliable data might increase the probablity of discovering new meaning. It may also confuse the meaning, so one has to apply one's mind as to where it would prove most effective.
Once new meaning has been discovered, you would want to increase the solution value of the data to test it for reliability (heuristically or anecdotally). Once emergent data tests positively for validity and reliability, you would consider the extent of the existential relationship as an indicator of causality.
It is a simple, yet complex soft-systems process. It can yield most-useful results, as emergent indicators. From a systems-scientific perspective, the results could be accurately tested and managed and may even outweigh that of correlation for systems integrity. The key value of this suggested approach is its consistent tracability, as relative objectivity.
For categorical variables (nominal variables with several categories each) one can use several methods to check for ASSOCIATIONS, almost all of which use the Chi-Square test for the purpose. In most cases, such associations represent "cause-to-effect" relationships. Note that the researcher is expected to know which variable is the "effect" and which variable(s) are the potential causes.
Find bellow 4 different options (for each one I am giving a reference of my own works, in which you can see the respective application – it might serve as example)
Option 1: apply the methodology CHAID (Chi-square Automatic Interaction Detector) algorithm incorporated in SPSS. As the name suggests, this is an automatic detector of interactions, based on the Chi-Square test of independence (Hand et al., 2001; Rokach and Maimom, 2008). This statistical tool produces a tree-type output, in which the most significant explanatory variable appears on the top of the tree, immediately under the output variable. Likewise, the remaining branches follow a descending hierarchy – displaying the other variables from the most significant association to the least, i.e., by decreasing order of Chi-square association. The Chi-square value of each pair and the corresponding p-value are also plotted in the tree diagram. Each tree branch expresses the combined events of different variables leading to the output variable results.
(ref) Marques P.H., Jesus, V., Olea S.A., Vairinhos, V., Jacinto, C. (2014). The effect of alcohol and drug testing at the workplace on individual’s occupational accident risk. Safety Science, 68, pp.108-120. DOI: 10.1016/j.ssci.2014.03.007
Option 2: apply MCA (multiple component analysis). You find this function also integrated in SPSS, or you can use an “add-ins” with Excel (eg: Tanagra). MCA allows identifying the associations between multiple categorical variables and provides a graphical display of the multidimensionality of the space, representing all the categories of the variables in a sub-space with the minimum number of dimensions possible (two). With MCA you can mapping the structure of the interrelationships between variables and find specific “configurations” or “patterns”.
(ref) Silva A.S.; Carvalho, H.; Oliveira, M.J.O.; Fialho, T.; Guedes Soares, C. ; Jacinto, C (2015). Achieving better safety at lower cost: good practice for learning with work accidents. The 8th International Conference on Working on Safety (WOS 2015), Porto, Portugal, 23-25 Sep 2015, Session CA4 (#102).
Option 3: use a “customized” datamining technique. This can be computed manually by someone good with good skills on software programming (also using the Chi-Square)
(ref) Silva, J.F. & Jacinto, C. (2012). Finding occupational accident patterns in the extractive industry using a systematic data mining approach. Reliability Engineering & System Safety, 108, pp.108-122, DOI: 10.1016/j.ress.2012.07.001
Option 4: use a modification of Chi-Square in Excel. This is a very simple method to carry out in Excel (basic), but you can measure only TWO VARIABLES at a time (however, each main variable can have many categories)
(ref) Jacinto, C. and Guedes Soares, C. (2008). The added value of the new ESAW/ Eurostat variables in accident analysis in the mining and quarrying industry. Journal of Safety Research, 39(6), pp.631-644. Elsevier. DOI:10.1016/j.jsr.2008.10.009
There are lots of techniques for finding correlations, as previous responders have discussed, but unless you're using some sort of experimental design, there are NO techniques that can tell you if the relationship is causal. Fundamentally, the only way to establish a causal relationship is to rule out other plausible explanations for the correlation. This requires having a theory that would explain the correlation, and showing that other possible theories are either inconsistent with the evidence or are implausible on other grounds. Qualitative data may be useful in doing this; see the attached paper.
There are different views of causality in the philosophical literature, and there are serious doubts that the nature of causality at the quantum level, or the concept of causal "laws" as used in physics, is relevant at the social/cultural scale. The most practical discussion of the use of causality at the latter scale is Nancy Cartwright and Jeremy Hardie, Evidence-Based Policy: A Practical Guide to Doing it Better (Oxford University Press, 2012). See also the attached paper.
I want to re-emphasize that there are NO techniques that enable you definitely determine if a correlation between variables is causal. The only way to do this is to rule out (beyond a reasonable doubt) other plausible explanations than causality for the correlation. In particular, statistical significance tests (chi-square, p values, etc.) can HELP you to rule out chance sampling error as one possible explanation for the correlation, but do NOT definitively establish causation, because they don't address other possible explanations (e.g., sampling bias, other causal variables).
Causation is much less about statistical method and more about research design. Even with a good theory, causation cannot be inferred without a methodology that controls for third variables that could also produce the desired effect.
Causation analysis can be divided into (1) without confounding and (2) with confounding. Confounding can be (1) measured or (2) unmeasured. Great advance sin methods and algorithms for causal inference have been made in the past decade. Please read my recent book: big data in omics and imaging: integrated analysis and causal inference (2018). CRC Press.
Momiao, your distinctions make sense for quantitative research that assumes a regularity theory of causation, but are pretty much irrelevant for qualitative research, or for research based on a realist/process approach to causation. I've posted several papers above that address the latter approaches; more a more detailed explanation, see the attached chapter on causation in my book A Realist Approach for Qualitative Research (SAGE, 2011).
Hello Professor Maxwell, I read some of your scholarly papers. Now I am working on the dissertation project of exploring the political forces mastering the politcs in post-Suharto Indonesia. I support the theory of oligarchy and party cartel theory, but I am thinking those theories are not properly define the existing political power in the post-authoritarian era. That is why I am trying to make a qualitative study, using phenemonological approach, to find a new concept that can combine the oligarchy and cartel into one single concept of "oligarchic cartelization". Is it possible to apply a Grounded Theory model in this research study? Or should I forget the realist perspective and trying to focus on phenomenology per se? Your response means a lot to me. Thank you!
Boni, I'm guessing from your topic that your dissertation is in political science. If so, the definition of "qualitative research" in this field is quite different from that in most other fields (see the attached paper), and I'm not sure what your study would involve, or how you see a phenomenological approach as relevant to your goals. A phenomenological study, in the usual sense, is an investigation into the lived experience of participants; and I don't see how this will address the issue of combining oligarchy and party cartel theory. A grounded theory approach, on the other hand, seems entirely appropriate in principle, but this depends very much on how the data you collect can answer the questions you have. What are your research questions? What sorts of data do you plan to collect?
My book "Big Data in omics and imaging: Intergrated analysi and causal inference" presented several statistics for testing causation between discrete variables. Also, read our paper "Hu P, Jiao R, Jin L, Xiong MM. (2018). Application of Causal Inference to Genomic Analysis: Advances in Methodology. Front Genet. 9:238."
Causal relation its about design and theory, isn t a method posibility. But, if your theory or design support it, i can recommend you to use structural equations models for categorical variables. a simple strategy for dependent variable is transform it to dummies variables (each category as a variable, with 0 and 1)
Rodrigo, I disagree. Structural equation models can establish a correlation among variables that suggest a causal relationship, but the most important thing to know about statistics is that correlation isn't causation