Before coming to this post I did take a look at Rothman's book and Sklo's book, but not very iluminating so far. Recently I was challenged with a problem of causal diagrams.
A partner o mine came with a dataset with more then 1500 observations and more then 1500 variables. Trying to organize the ideas on how and what to explore we came to a point where we were drawing causal diagrams such as:
A -------> B ------> Outcome
| ^ ^
| | |
D -------> C ----------
Im not able to draw the diagram we came to, but it looks like it has at least 3 stages and some variables may come into more then one stage simultaneously. And also it seems that at least one variable is outside the diagram and directly related to the outcome.
Looking at this diagram it does not seem reasonable to me the a multinomial or a logitic regression may represent such an idea. So my guess is that y = x1B1 + .... + xnBn may not reasonably represent the diagram above.
At this point I thought that a neural network could work fine as one theoretical argument often seen at the neural networks text is that it may reasonably approximate any function.
So the question is: how to statistically test a diagram such as the above?
Also, looking forward for any comment and perhaps suggestions for further reading on this topic.