Let's say I ran forty cancer samples, twenty responders and twenty non-responders to some treatment. I performed alignment and annotation of SNVs and indels in each sample. Now I want to know if there are any deferentially affected genes in the two groups that may justify the different response I observed in those tumors. Problem is, simple chi squares won't do, because a single gene can be affected by multiple potentially deleterious SNVs and indels, and I may have to bin multiple alterations to have enough power. I may therefore tag each gene as affected/unaffected by a potentially deleterious variation, and proceed this way with my analyses. But how do I know if a "potentially" deleterious variation is really so? It's impossible to validate biologically all of them. Or I may restrict my analysis to variations that recur more than once in two different samples, but then again, some genes can be altered in many places, and in one place just once out of several samples, and I would overlook them.
Finally, I may consider not genes but pathways, and see whether some pathways are involved by deleterious alterations in responder tumors and not in non-responders, or vice-verse, but pathways are not as well defined as one may want to think; there is a risk of including deleterious variations that have nothing to do with the underlying biology of my samples. It's more philosophy than statistics here. What would you do? What do you you recommend?