In the bioinformatics research community, we often build models based on the statistical significance of the raw data. but their outcome sometimes (maybe most of the times) is in conflict with biological significance. For example, cancer diagnosis based on high-throughput gene expression data, no matter how good the result is, it may still make no sense for the biologists.
reasons?
thoughts?
future directions?
In the last few decades a large number of techniques have become available for the analysis of biological systems, from quantitative PCR to microarrays, going through mass spectrometry. Other techniques have improved and followed suit, giving a more detailed view of how the biological works.
To analyse this overload of information some methodological choices have been made that are not necessarily well grounded. These choices have a profound effect on what can be considered relevant, both from the biological or from the statistical point of view. These are some of those choices:
A/ Biological problems are often treated as steady state phenomena, however many (if not most) biological processes are historical in nature. Cells and organisms go through transient states that engage them in certain pathways, or places them in specific equilibrium states that, once reached, are self maintained. The historical process—the steps taken—to reach such a state are not further required. The reconstruction of such a process is not necessarily reversible nor trivial from the end state.
Biological processes are also dynamic, while many methodologies are static snapshots of the biological state at a given time point. This might give a skewed image of what the real biological situation is. Increasing the number of assessments might improve the situation (admittedly that it is not always obvious or ethical to do so), but only inasmuch as the process can be captured within the time-frame of the repeated sampling. That is not trivial for extremely rapid processes, nor might it be practical for very slow phenomena either.
B/ Gene expression levels are equated to gene activity. Different levels of gene expression (understood as levels of mRNA transcribed present in the cell) do not automatically amount to more end product (ie protein) nor of the derived metabolites (if the protein is an enzyme). It is true that producing greater amounts of a given transcripts does drive the process towards more proteins, but that does not imply greater amount of *functional* proteins. There are number of post-translational modifications required to make a protein work properly. One of these many conditions is the requirement for partner products, since many of the cells products function in vivo as multi-unit complexes. The over-expression of a single gene would not lead to a complete complex, thus it does not determine in itself biological changes.
C/ Tight equilibria are not always considered important. For practical reasons, when analysing large datasets with high-throughput methods, the analysis often concentrates on "large" changes, with n-fold differences. And it obviously is a safe bet that large differences will be easier to link to the conditions compared. But the assumptions that a) small differences are not important or b) large differences are always important is not a foregone conclusion. Even a small imbalance in a critical process could lead to a catastrophic cascade of events, such as the accumulation of metabolites that in sufficient amounts would be toxic and lead to cell death or severe dysfunction, or symmetrically to deprivation and loss of function.
D/ Ignoring pre-emptive systems. Some biological systems anticipate a sought for environmental change in order to produce a rapid response. The building up of such a state is not necessarily apparent previous to the triggering condition (an equivalent of a slow coiling-up process or cumulation of a certain product, ion or metabolite) nor by the process itself (eg rapid response processes by post-translational mechanisms or release by other blocking mechanism).
E/ Signal vs Background noise (or, the devil is in the details). More often than not, high throughput methods try to derive information from very heterogeneous sources. Though this might be an inevitable condition, it does raise important challenges in trying to sort out the signal from the background noise.
In conclusion, Nature does not play nicely. Biologically relevant data might not show as significant in indiscriminate datasets. What seems obvious is not necessarily what has been retained in the historical process of Natural Selection. Biostatistical and bioinformatics approaches to biological systems will always need to go through experimental validation in order to determine their biological relevance. Likewise, properly conducted experiments are necessary in order to leverage the full potential of bioinformatics and biostatistics.
PS. On the specific topic of cancer:
Cancer is a generic name for a category of pathologies that have in common the unregulated growth of cells, with partial loss of cell differentiation. But there is not one cancer, but many cancers, each one with its own biological characteristics. So, at a certain point, cancer treatment might need to be approached on an individual basis (search for personalized medicine). However there are sufficient broad scale processes that are shared among cancers (eg apoptosis, cell cycle regulation) that can be robust targets for anticancer therapy. And, as with other disease like diabetes, the population/environment component leading to the onset of these disease needs to be established and addressed.
Debate1:
David Heath's An Introduction to Experimental Design and Statistics for Biology (UCL Press, 1995), that discusses, albeit briefly, biological significance as opposed to statistical significance:
"…a statistically significant difference is not necessarily biologically significant. By this we mean that it may not be interesting from a biological point of view or useful from a practical point of view. The reason is that in actual fact the null hypothesis [that the samples being compared are identical] is probably never likely to be true!
Debate2:
As an area of bioinformatics study, the major goal of this subject is to provide the biologists biologically meaningful information about the genes and related things. Through these information, biologists are able to discover unknowns and reaffirm previously knowledge.
Well, first there is the issue of what is meant by "biologically significant". Biological significance _should_ be defined based on science; that is, to say some result is biologically significant, one should have evidence that the result is reproducible, rather than just a notion that the result doesn't fit with the prevailing biological theory on the subject. Such evidence is usually expected to be statistically significant, because statistical theory is our best gauge of whether a result is reproducible. So to this extent, I object to characterizing this question as "statistical _versus_ biological" significance, because biological significance is often judged on the basis of statistical significance, but _in some other test_, not in the diagnostic you are currently interested in.
The type of error you mentioned above are false positives, results that are statistically significant but not relevant to the biology of interest. This is especially a problem in biology because biological systems are very complex, poorly understood, and hard to control experimentally. Thus, it is very hard to choose a null hypothesis that truly reflects all the ways in which a diagnostic (or other statistic) might have an extreme value _other than_ the few ways the researcher is truly interested in observing. An individual may have any number of health issues that are unrelated to the one we think is most major (such as cancer), and some subset of those health issues may cause our diagnostic not to act as one might expect at random (i.e. under a simple null hypothesis that doesn't take into account the fact that human health doesn't boil down to "cancer" or "no cancer").
It is very challenging to design a diagnostic or other experiment to focus on one aspect of biology in the presence of many other, uncontrolled effects. In general, statistical methods are not designed for this situation; they are designed to give reliable answers when the experiment is properly controlled, and thus extreme values of the statistic are a reliable indicator of the presence of the effect of interest. We should be realistic about what high-information content, high-throughput methods will give us. A high-throughput gene expression experiment provides information on the majority of the biology occurring in the test subject(s). We should expect it to be difficult to separate one effect of interest out of the full set of biological phenomena occurring in such complex biological systems. We may need to design better experiments, such as comparing a pool of affected individuals to a pool of unaffected, which should average out effects unrelated to the effect of interest. Of course, in the case of cancer, each individual may be unique, and that approach may not work. We may need to understand cancers better, and rely on diagnostics more closely targeted to the known characteristics of cancer.
Hi max, thanks for your first comment. I have redefined the question.
Yes, I agreed with your thoughts.In my opinions, first of all, biological systems are too complex to be modeled by simple statistic methods, especially in the case of cancer; secondly, large data noisy given by the high-throughput methods themself; thirdly, the understand of cancer is limited, even include biologists.
But there are still thousands of papers published every year, using different algorithms to help the diagnosis, there must be a reason :(
“We may need to design better experiments, such as comparing a pool of affected individuals to a pool of unaffected, which should average out effects unrelated to the effect of interest. Of course, in the case of cancer, each individual may be unique, and that approach may not work. We may need to understand cancers better, and rely on diagnostics more closely targeted to the known characteristics of cancer.”
This might be the experiment you mentioned above:
http://llmpp.nih.gov/lymphoma/
As different types of cancer can behave very differently, and each individual patient may be unique. Does that mean whether this method works or not is problem-dependent? even if the method works for some types of cancer, whether their outcome have biological relevance or not is also problem-dependent?
What it means is that different diseases have different characteristics, and may require different methods. Look at sickle-cell anemia or cystic fibrosis; those are examples of diseases that are primarily caused by single genes. Heart disease, on the other hand, is not; looking for "the heart disease gene" will fail irrespective of what method is used. Similarly, it is a mistake to think that, now that we have the ability to collect expression information on every gene in the genome at once, there must be a reliable, general method to identify, from an affordably small set of samples, which gene or subset of genes are relevant to a disease which varies from one affected person to another, among the tens of thousands of assayed genes.
I don't mean to be so negative. I just mean to say that at the moment we are comparing "affected" with "non-affected", under the simple assumption that differences between two individuals, one affected and one non-affected, are likely to be related to the disease of interest, rather than a general stress effect or one of a myriad other differences between the two individuals that doesn't happen to be relevant to the disease. By pooling samples from a number of affected individuals and making a separate pool of samples from unaffected individuals, the differences between individuals--both those related to the disease and those not related to the disease--may average out sufficiently to be able to detect the differences that all affected individuals exhibit from all unaffected individuals.
Whether that particular suggestion would work or not, my point is that we have a lot of choices when designing an experiment. If we fail to take account of the inherent complexity of biological systems when designing our experiment, we run the risk of collecting a lot of data that will obscure the effect we are most interested in by its sheer volume. The volume of data is a known attribute of the experiment; so is, at least to a rough level, the degree of complexity of the biological system. We should take these factors into account, rather than relying on the assumption that "everything else is the same".
I would suggest this paper
Nakagawa, S. & Cuthill, I. C. Effect size, confidence interval and statistical significance: a practical guide for biologists. Biol Rev Camb Philos Soc 82, 591–605 (2007).
that gives some hints on how to solve the whole question. The take home message is "don't think in terms of p-values and fixed thresholds"
I’m very happy that someone has raised this thorny question. From my point of view (I’m a biologist) in many cases statistical meanings don’t match to biological meanings for several reasons: first of all the complexity of biological systems. In very simple words, the cells are unable to count and don’t have any notion of mathematics and statistics, however the scientists hardly use mathematics and statistics to describe the functional and structural relevance of biological phenomena. It looks like a paradox… For instance, we have a gene upregulated in three different cell lines after a drug treatment: cell line1 has a fold change of 10, cell line2 is 50, cell lines is 100. Accordingly, a biologist would say that the expression of this gene is induced by those drug. Very very simple and trivial. However, if you insert this data inside a dataset made of a thousand of genes for three cell lines analyzed before and after a treatment and apply the statistical method of SAM (Significance Analysis of Microarrays) with any correction for multiple-comparisons… this gene will not appear among the differentially expressed genes. A biologist would say that this make no sense, while a mathematician will reply that this is correct because the low numerical reproducibility of the upregulation (that is a very high deviation standard). In your opinion, who is the scientist having reason about this issue? Is this gene upregulated or not?
Marco: The meaning of "not significant" is THAT YOU CAN GET SUCH TREND BY CHANCE. This means that the fact that "cells don't count" (which is false, of course they can count - E. coli can even takes derivatives) matters very little: if the chance that peaking three noise cell lines gives you a trend is 0.5, the "trend" you see is NOTHING!!!!
I think the real question is not how to save insignificant results (even all my mathematically naive friends want to), but how to interpret significant differences. The fact that there is a significant negative correlation - does it mean anything?
I have to say I doubt that there are many cases in which the experiment was well designed, the right statistical test was performed (i.e. the right null hypothesis was used) and the results are significant, where the results are "meaningless". The fact that we lack the theortical framework to interpret them does not make them non-significant.
In the last few decades a large number of techniques have become available for the analysis of biological systems, from quantitative PCR to microarrays, going through mass spectrometry. Other techniques have improved and followed suit, giving a more detailed view of how the biological works.
To analyse this overload of information some methodological choices have been made that are not necessarily well grounded. These choices have a profound effect on what can be considered relevant, both from the biological or from the statistical point of view. These are some of those choices:
A/ Biological problems are often treated as steady state phenomena, however many (if not most) biological processes are historical in nature. Cells and organisms go through transient states that engage them in certain pathways, or places them in specific equilibrium states that, once reached, are self maintained. The historical process—the steps taken—to reach such a state are not further required. The reconstruction of such a process is not necessarily reversible nor trivial from the end state.
Biological processes are also dynamic, while many methodologies are static snapshots of the biological state at a given time point. This might give a skewed image of what the real biological situation is. Increasing the number of assessments might improve the situation (admittedly that it is not always obvious or ethical to do so), but only inasmuch as the process can be captured within the time-frame of the repeated sampling. That is not trivial for extremely rapid processes, nor might it be practical for very slow phenomena either.
B/ Gene expression levels are equated to gene activity. Different levels of gene expression (understood as levels of mRNA transcribed present in the cell) do not automatically amount to more end product (ie protein) nor of the derived metabolites (if the protein is an enzyme). It is true that producing greater amounts of a given transcripts does drive the process towards more proteins, but that does not imply greater amount of *functional* proteins. There are number of post-translational modifications required to make a protein work properly. One of these many conditions is the requirement for partner products, since many of the cells products function in vivo as multi-unit complexes. The over-expression of a single gene would not lead to a complete complex, thus it does not determine in itself biological changes.
C/ Tight equilibria are not always considered important. For practical reasons, when analysing large datasets with high-throughput methods, the analysis often concentrates on "large" changes, with n-fold differences. And it obviously is a safe bet that large differences will be easier to link to the conditions compared. But the assumptions that a) small differences are not important or b) large differences are always important is not a foregone conclusion. Even a small imbalance in a critical process could lead to a catastrophic cascade of events, such as the accumulation of metabolites that in sufficient amounts would be toxic and lead to cell death or severe dysfunction, or symmetrically to deprivation and loss of function.
D/ Ignoring pre-emptive systems. Some biological systems anticipate a sought for environmental change in order to produce a rapid response. The building up of such a state is not necessarily apparent previous to the triggering condition (an equivalent of a slow coiling-up process or cumulation of a certain product, ion or metabolite) nor by the process itself (eg rapid response processes by post-translational mechanisms or release by other blocking mechanism).
E/ Signal vs Background noise (or, the devil is in the details). More often than not, high throughput methods try to derive information from very heterogeneous sources. Though this might be an inevitable condition, it does raise important challenges in trying to sort out the signal from the background noise.
In conclusion, Nature does not play nicely. Biologically relevant data might not show as significant in indiscriminate datasets. What seems obvious is not necessarily what has been retained in the historical process of Natural Selection. Biostatistical and bioinformatics approaches to biological systems will always need to go through experimental validation in order to determine their biological relevance. Likewise, properly conducted experiments are necessary in order to leverage the full potential of bioinformatics and biostatistics.
PS. On the specific topic of cancer:
Cancer is a generic name for a category of pathologies that have in common the unregulated growth of cells, with partial loss of cell differentiation. But there is not one cancer, but many cancers, each one with its own biological characteristics. So, at a certain point, cancer treatment might need to be approached on an individual basis (search for personalized medicine). However there are sufficient broad scale processes that are shared among cancers (eg apoptosis, cell cycle regulation) that can be robust targets for anticancer therapy. And, as with other disease like diabetes, the population/environment component leading to the onset of these disease needs to be established and addressed.
Cancer is complex. Statistical approach is just one way, and often the first way to untangling the complex system. There should be a lot of work to be done to establish 'biological relevance'. But statistical approaches are helpful.
That's a great question. I am a developmental biologist; the main reason, to my mind, is that you apply statistics to the changes in gene expression; and changes in gene expression are not always reflected in the phenotype of the cancer cell, that is what you call "biological significance". Cancer cells are the most robust of the biological systems. If you switch off one of the survival pathways with a drug, it will downregulate certain marker genes ( and this will be highly statistically significant), but it would switch on another pathway you did not think about and survive; the phenotype will not change. You kick it out of the door, and it comes back through the window and through the chimney. The only hope it to block the door, the windows, the chimney and hope, that it does not penetrate through the walls.
Hi Zhenyu,
I'll play devil's advocate, bear with me. It seems to me that if for a given set of measurements I can convincingly demonstrate both statistical significance and biological irrelevance, then this is compelling evidence that I have been analyzing data that to do not suffice for answering the biological question. In this light, demonstrating statistical significance and biological irrelevance is useful because it teaches you that you were on the wrong track.
My twocents,
Nicholas
I completely subscribe to what Dr. Morales had said. Improper biological assumptions may lead to incorrect models and incorrectly applied statistical methods. This is not to say it is always researcher's fault, sometimes we just don't know enough to build a good model. Of course, sometimes we are just plain lazy.....
From this discussion it has become apparent that amalgamation of data at multiple levels, such as proteomics and microRNA data rather than just using the mRNA abundance data, will be essential
to identify the “candidate genes/networks” and the regulatory mechanisms associated with a given phenotype. Deriving biological relevance from just one set of dataset would be very much analogous to each of the “blind man” thinking that they have the right answer!
What I am thinking is that Statistical significance is very important just as the biological relevance. Working in dry labs and concluding some significance for associations between drugs-diseases or drugs-genes will give the first seed for biologists to work and investigate. The molecular nature of a disease or a drug mechanism is very complex and it is necessary to take a global view of the whole genome and determine the overexpressed or the downregulated genes. That's what help biologist to go to their labs and analyze the significant results instead of working on every single gene because that's impossible. Such approaches are really interesting in drug discovery in what's called drug repositioning. It's really amazing to find a significant connection between two drugs and then try to test each drug against the disease the other drug is being used to treat. That's will save billions and years in comparison with the traditional methods that just work by chance to find a drug that can treat specific disease or simply that based on manufacturing a drug from scratch which needs 1$ billion and at least 15 years where there is only 10% chance that this drug will pass the testing phases.
Unfortunately it is quite common to find papers in ecology journals in which the authors confound statistical significance with biological relevance or with strength of evidence against the null hypothesis. These mistakes are not trivial semantic problems because they may finally lead to wrong scientific conclusions, and hence to prevent long-term
knowledge accumulation in ecology. Using correlation analysis as an example I present the four possible interactions that can take place between biological relevance (based on the value of the correlation coefficient as an effect size metric) and statistical significance (based on p-values). Importantly, I recall that the strength of evidence that supports the parameter estimate or the null hypothesis, given our data, can only be assessed by means of Bayes’ rule.
Ref: Alejandro Martı´nez-Abraı´n*
The Scientific Committee (SC) developed an opinion addressing the issue of statistical significance and biological relevance. The objective of the document is to help EFSA Scientific Panels and Committee in the assessment of biologically relevant effects.
The SC considered the distinction between the concepts of biological relevance and statistical significance and produced descriptions of the terms. It is suggested that EFSA Experts and Staff members should use the terminology of biological relevance and statistical significance as interpreted by the SC in their considerations.
The SC recommends that the nature and size of biological changes or differences seen in studies that would be considered relevant should be defined before studies are initiated. The size of such changes should be used to design studies with sufficient statistical power to be able to detect effects of such size if they truly occurred.
Statistical significance is considered as just one part of an appropriate statistical analysis of a well designed experiment or study. Identifying statistical significance should not be the primary objective of a statistical analysis. The relationship of statistical significance to the concept of hypothesis testing was considered and the limitations on the use of hypothesis testing in the risk assessment process when interpreting data were noted.
The SC therefore recommended that less emphasis should be placed upon the reporting of statistical significance and more on statistical point estimation and associated interval estimations (e.g. Confidence Interval) as more information can be presented using the latter.
In addition, the SC recommends that a complete description of the methods used, the programming code and the raw data are made available to the assessors so that alternative analyses could be conducted to test the robustness of any conclusions drawn.
Ref: EFSA Journal 2011;9(9):2372 [17 pp.]. doi:10.2903/j.efsa.2011.2372
Statistics is a great tool - but only as good as the underlying model is. Post-ENCODE genomics blew away the obsolete triad of genes/junk dna/central dogma old paradigm. With few new software-enabling algorithmic paradigms around (FractoGene is one, now with multiple POC-s from independent experimental studies by high-ranking scientist from top-notch institutions, published in Science and Nature), statistics is not as effective as could be, as long as it is a supporting tool for "brute force approaches" (like "hypothesis-free search for genes" at a time when we do not even have a universally accepted single definition of what a "gene" is).
Reference (in full text) for the recent independent experimental POC: 2011 High order chromatin architecture shapes the landscape of chromosomal alterations in cancer, by Fundenberg G, Getz G, Meyerson M, Mirny L.A. http://www.nature.com/nbt/journal/v29/n12/full/nbt.2049.html
Dear Zhenyu Wang
Please read my research article as an example that proves the relation between computer results and what happened in reality. It is open source and you can download it freely.
Scientific Research and Essays Vol. 6(14), pp. 3049-3057, 18 July, 2011
ISSN 1992-2248 ©2011 Academic Journals http://www.academicjournals.org/sre/abstracts/abstracts/abstracts2011/18July/Hadad%20et%20al.htm
That's the essence of having/working in teams -- with diverse backgrounds e.g. medics, biostatisticians, bioinformaticians etc -- for optimal interpretations. Otherwise, if you rely only on statistical significance, then things can go really sour (in the application of the interventions being investigated).
Hi, guys, thanks a lot for all the inspired comments.
Some more thoughts on today:
The conflict between the "Statistical significance" and "Biological relevance" may come from the conflict between two different epidemiological roots: data-driven knowledge discovery and hypothesis-driven knowledge discovery.
Data can be viewed as the lowest level of abstraction of the world, data need to be structured to become information and information needs to be set into context, validated and further analysed for causal relations to become knowledge or understanding, science is a systematic enterprise that builds and organizes knowledge in the form of testable explanations and predictions the universe.
Modern science is supposed to be mostly hypothesis-driven, include modern molecular biology. But the biology systems are too complex, and there exists no "closed theory" of molecular interaction. Most research hypotheses have been derived from previous experimental finding. It is hard to propose good hypotheses without knowing the whole picture of problem. Therefore, people start to (maybe we have to) ask for help from "data-driven" approaches. We input data into models or computers directly, not information, not knowledge. Data-driven approaches seem to have advantages on coping with the problems without enough prior knowledge. Recent technological advances, like high-throughput technology, produce extremely large amounts of data. The situation tends to be data-rich but hypothesis-poor, therefore data driven approaches seem can give us the larger picture of the biology systems than the hypothesis-driven approaches can. But data-driven approaches are also limited by the data quality, the development of algorithms, and the ability of computers. Most importantly, also the questions I asked above, can we understand or can you believe the results from algorithms. How much they can help us to built knowledge in front of the uncertainty, the confusion, and the ignorance of biology systems.
So far, I think both data-driven approaches and hypothesis-driven approaches have long way to go, but their relation should not be competitive but complementary.
Just like many people draw a distinction between bioinformatics and computational biology portray the former as a tool kit and the latter as science, though the terms bioinformatics and computational biology are often used interchangeably.
Now their distinction can be easily drawn, if we consider problems from the epidemiological roots: Bioinformatics more properly refers to the creation and advancement of algorithms, computational and statistical techniques, and theory to solve formal and practical problems arising from the management and analysis of biological data. Computational biology, on the other hand, refers to hypothesis-driven investigation of a specific biological problem using computers, carried out with experimental or simulated data, with the primary goal of discovery and the advancement of biological knowledge. Put more simply, bioinformatics is concerned with the information while computational biology is concerned with the hypotheses. Bioinformatics is also often specified as an applied subfield of the more general discipline of biomedical informatics.
It is also a good rule not to put too much confidence in experimental results until they have been confirmed by Theory. - Sir Arthur Eddington
Sorry Dr. Wang, but I take exception to your tenet that "maybe we have to... ask for help from 'data-driven' (hypothesis free) approaches". IMHO there is no such thing as "hypothesis free" research; it was a misinterpretation of a statement at a meeting upon the collapse of expecting zillions of genes (and found astoundingly few) that "hypothesis free search of genes" was advocated - till the vacuum of collapse of old "genes/junk DNA/central dogma" paradigm was replaced. Indeed, the same person (science adviser to the US President) co-published a Science cover issue, heralding a message to the effect "Mr. President, the Genome is Fractal!" Dr. Eric Topol also touches on the same subject in his bestseller "The Creative Destruction of Medicine" and concludes that even the most "hypothesis-free"-looking approaches, as a matter of course, imply lots and lots of (sometimes hidden) assumptions. As quoted above, late 2011 independent experimental studies provided proof of concept that fractal defects (in the studied case, copy number alterations) are implicated in cancer. How can anyone interpret e.g. "copy number variations" (huge repeats) without thinking of a very plausible theoretical (indeed, software-enabling algorithmic) framework of fractals?
Data driven research is not necessarily hypothesis-free. But, coming back to the main question, significance is far too often misused for several reasons. In the last couple of years, several authors pointed out that the pervasive abuse of the term has had vast negative consequences. The concept itself is at stake. The test of hypotheses without formulating them, the lack of care in observing pre-requisites of the statistical methods, etc.
A curiosity: only a few people know what the origin of the magical 5% (p
> Yet most of proteomics and genomics is based on probability...
But that is very healthy, provided that it is correctly applied. It is quite easy to perform a test without checking the validity of the assumptions. And publish. Peer review is insufficient to screen it out.
In my opinion guaranteed applicability is needed, first hand.
But, more importantly, no interpretation survives bad modeling, and no gain in knowledge is obtained by asking questions that are ill formulated or even simply wrong.
Statistical significance provides a propensity for future directions of research, it doesn't necessarily give a proof of concept or a biological relevance or correlating a hypothesis until exemplified by research. It provides a perspective in a plethora of interdependent factors.
Taking an example, I cannot correlate a possible direct relationship between plant shoot growth and the soil zinc availability without having experimentally understood all the intermediary process associated and if it is worth formulating a correlation. I may do a differential zn application and measure the plant shoot growth (could lead to a significant correlation) but would the zn application be the only factor that defines shoot growth. Absolutely not. What it does show is that zn might be one of those important elements essential for a plant growth. That's how modern research is fast forwarding towards.
@pedro
Unfortunately, biological interpretation of the "meaning" of expression profiles is even worst. Looking at expression patterns an inferring a model, which is how biologists are trained to draw conclusions, without any statistical filtering is bound to yield science fiction.
@Andras
You should be aware of a common practice in bio-medical research, in which words are "hijacked" and receive new meanings. Let me demonstrate: NIH is enthusiastically moving from "evidence based medicine" to "personalized medicine". Does that mean you don't need evidence in personalized medicine? No, that comes to represent leaving a specific school of thought that called itself "evidence based medicine" but meant in fact something very specific. Another example: how is "molecular biology" different from "biochemistry"? Does biochemistry deal with atoms?
I think that the fact that hypothesis free research is not hypothesis free does not mean it is different from hypothesis driven research. Perhaps the right names would be "hypthesis focused" vs. "exploratory" or, as was suggested, "data-driven". As mentioned throughout the discussion here, one of the main reasons for meaningless statistical results is poor hypothesis formulation - in a supposedly hypothesis free research...
In medicine, I note that data-driven research helps us jog our thoughts. I only see it as a way to formulate hypothesis that should be tested with hypothesis-focused research. Example? We report that a pattern of clinical measurements separate diabetic patients into two groups, one that complains about leg mains (possible a sign of arterial disease) and one that does not. I personally will not believe these results are meaningful (I am trying to avoid the word significant in it's other meaning) until I come up with a carefully designed experiment that will test the hypothesis.
Any kind of event into the cells looks like a kind of "mechanical structured" , one part will always work in its own time and place, for example, we can compate the cells' events as a car one, you must have a fuel, main pieces, secondary peaces, accessory ones, based on: cause -> fact -> consequences. You just have a fact, in your example,there was a cancer problem, what are the causes of it? And consequences?
Causes you will put in the Statistical hypothesis.
Consequences are the output or what is expected from the statistical hypotheses.
Only those points of view you must have in statistical problems,
Beyond this, you musta get only what is relevant to biology, the asks are more important than the anwers although we must find the main anwer.
Think of it
^^all your discussions has been by and large how to perfect the biological relatedness with statistical data. It would be difficult for a modern, 'hypothesis focused' research as Eitan puts it to function independently without statistical significance by applying appropriate filters permitted within the range of study. This will obviously narrow down the scope of a research to befit a purpose.
However 'exploratory' research is purely based on a much larger concept which might not be intricately understood. Hence applying such stringent filters is not a choice as we do not have knowledge on what defines the filters for that research. Which is why a direction is necessary. In such cases generalized filters are the way.
Although we can apply specific filters under 'Assumptions' and 'conditions/cases' wherein they need to be proved by actual ground work. If we try fitting in specific indicators of what we think is a possible criterion we may not be very successful as we are working on some baseless or groundless claims.
I have come across many experimental biologists who object to the statistics being a criteria. However, I keep telling them: how can you ignore this difference just because it's small? why is the size of an effect such a great criteria for identifying the "importance" of phenomena? Shouldn't you account for the signal-to-noise ratio? And isn't that what statistical tests such a t-test do?
The assumption that "more is better" is taught in Science Education courses as one of the most common naive mistakes children do.
@Eitan, I totally agree with your remarks. Let's take the "common practice in bio-medical research, in which words are 'hijacked' and receive new meanings. Let me demonstrate: NIH is enthusiastically moving from 'evidence based medicine' to 'personalized medicine'". For my FractoGene paradigm I had to spend a lot of effort to destroy the "hijacked" (frankly, silly) notions of "Junk DNA" and "Central Dogma" tenets - to end half of a century of delay, since the interlocking falsehoods blocked The Principle of Recursive Genome Function. Originally, both Ohno and Crick (respectively) introduced their "theory" as catchy sound-bites. Later misinterpretations, serving selfish practical purposes, confused generations. I would add to your remark that "the assumption that 'more is better' is ... one of the most common naive mistakes children do" - most unfortunately applies not only to children, but is very typical of some governments' "big science project" philosophy and practice. You refer to it by the example that in government circles "evidence based medicine" yields to "personalized medicine". Verbal trickery has not much to do with science - it is typical of politics, under what sound bites is possible to hijack the biggest amount of taxpayer money for "big science" projects. Once the money is secured under false pretext, it should not be surprising that the spending disregards such long-established truths such as formulated in Thomas Kuhn's classic "The Structure of Scientific Revolutions".
@Eitan, I agree in this points: The size of an effect - mostly estimated through a parameter in some model (the average is just the simplest "model") - does not tell us anything about any kind of relevance. Btw, relevance is a property of our concepts, not of data. I disagree in this point: a statistical test like the t-test does NOT relate signal to noise. The t value is the ratio of an effect estimate and the standard error of the estimate. It is a 'signal-to-"what we can know about the model parameter value given the data assuming independent and congeneric errors'-ratio. The possible SNR measure is, e.g., the effect divided by the standard deviation (known as Cohan's d).
@Jochen, I agree regarding SNR. As one collects more data neither the signal level nor the noise level changes, and the SNR will stay the same; however our ability to _distinguish_ the (average) signal from the noise does increase. Student's t-statistic will approach a p-value of zero more and more surely and closely as one collects more data (if a constant signal is in fact present, OR even if there is no "signal" but the noise does not fit a Gaussian model!); in contrast, Cohen's d will _not_ become increasingly more extreme.
I feel this discussion falls into the realm of philosophy. Most common statistics relies on a test of hypothesis. Typically, you set forth an null hypothesis (all of my data sets are samples of one unique population) and the estimate the probability of such hypothesis being true. The answer, no matter how solid ( the probability of the null hypothesis being true may be as low as one in a million) is still the answer to a very defined question, do not expect statistics to answer anything else but what you asked. It is the scientist the one responsible for the meaningful interpretation of such result. Such interpretation usually derives from the hypothesis the scientist has build from an insightful consideration of the biology behind the data sets: "these sets are expected to differ because...". The problem with high-throughput data is the absence of a defined biological hypothesis behind the experimental setup, most of the time your starting hypothesis is something like "some differences will stand out and its cause can then be inferred from the interrelations between the components showing changes". While this line of thinking may be true on its very basis, the introspection required to extract the meaningful interrelations may be out of reach"
I agree (with Rogerio) that statistical significance is always about a very defined question: "In this experiment, what is the probability that the null hypothesis is true (that the test subjects and control subjects are indistinguishable on this/these measurement(s))?"
I think the biological significance question can also be stated succinctly: "Is what I observed in this experiment related to the biological effect I am studying, or to something else (an aspect of the experiment itself, or some other biological effect that is not of interest to me)?" It should be clear that these are entirely separate questions. Whether there are biological effects that are not of interest to the researcher which can result in significant test results is due to the _design_ of the experiment, not to the methods used to judge statistical significance.
I also agree with Rogerio about the fundamental differences between "hypothesis-driven" and "data-driven" experiments. Hypothesis-driven experiments are so-named because the researcher has a specific _alternative_ hypothesis in mind--one expected to be _true_ when observing the behavior of interest--and tend to be low-throughput. However, the scientific method does not test the alternative hypothesis directly; the scientific method is to find evidence for rejecting a (null) hypothesis, one that we expect to be _false_ when observing the behavior of interest; what cannot be rejected remains as a possible explanation. A successful experimental design is therefore one that causes the null hypothesis to be as close to "the alternative hypothesis is false" as possible.
A quick look at how many drugs that work well in animal models, yet fail to work in human subjects, points out that even in hypothesis-driven experiments, what we see is often due to biological effects that were different from what we expected--i.e. that the question of biological significance is very difficult! This is therefore not merely a philosophical question, but a very practical one.
Data-driven experiments are generally high-throughput; there is generally no specific alternative hypothesis, and the null hypothesis is expected to be _true_ for the majority of the data. Without a specific alternative hypothesis it is much harder to choose a null hypothesis close to "the alternative hypothesis is false". :-) Furthermore, the more results we obtain, the more thoroughly we sample even the improbable "tails" of the null distribution. Thus, high-throughput experiments will expose errors in modeling the null distribution more stringently than low-throughput experiments (these errors will be exposed as false positives without any underlying biological cause).
So there you have it: high-throughput experiments (whether hypothesis-driven or not!) are more susceptible to small deficiencies in the statistical treatment of the data, and data-driven science without a specific alternative hypothesis to guide experimental design are less likely to provide correspondence between statistical significance and biological significance, because it is harder to define a well-controlled experimental design.
I hope I have sufficiently made my main point: that biological significance is strongly associated with the design of the experiment, NOT with statistical significance.
@Jochen Wilhelm: I meant "signal to noise" in the broadest sense. If you consider sampling error as a type of experimental noise (i.e. something that adds a false signal), than t-test helps. In other words, what I meant is that you need some way to build confidence in your measurements, and t-test helps. Obviously it is over-used and misused in biology (don't get me started about medicine), and many great - and more suitable tests - are ignored.
This does raise the possibility that one of the reasons many biologists detest statistics is that they don't understand. I am an informally trained bioinformatician (there was no formal training back when I started); my statistics courses were very basic. *I* find it hard to follow most of the tests, and to develop trust in them. I can't imagine how confused many biologists are when faced with statistical results, especially advanced ones.
@Eitan: I think I see your point. However, I still think there is a misconception. As you write, the (null hypothesis) test (NHT) helps "to build confidence in your measurements". This is just *not* the case. It helps to build confidence in the estimated model parameters (given the underlying assumptions). It is a (quite indirect) measure for the amount of information provided by the data w.r.t. some hypothesis. But it is not telling us anything about the measurements itself - if one does not consider the sample size. Considering the sample size, but, takes us back to the the relative effect (i.e., effect divided by its standard deviation).
To my opinion, most biologists do not understand NHTs because they assume that such tests would a) answer a scientific question and b) decide something. Both assumptions are essentially wrong. Fisher used the p-value as summary statistic that would have to be interpreted as any other statistic, too, in the context of all the rest of the knowledge. No simple "yes/no rule". The decision is an action of the researcher, primarily based on expert knowledge. A more reasonable/sensible model with a higher p-value may well be preferrable over a model with a lower p-value. Neyman/Pearson thought to give a recipe for making automated desicions by a simple rule. Well required by industrialized research. Don't think - let the test decide. If additional assumptions are met (especially that the frequency distribution of the residuals do match the probability distibution of the error model), the long-run properties of this strategy are well-defined and very useful. Now a third problem for biologists arises by mixing these two basically incompatible methods. Unfortunately, this mixture is taught in most simple statistics textbooks for biologists.
In my opinion there is no conflict and it is all about usefulness: statistics are just a tool in this context. If you obtain some experimental results, they are statistically not significant, but your follow up experiment using those results works, then so what? But given multiple choices of experimental follow ups, it sounds sensible to consider the ones better supported by statistics. Whatever gets you through the night!
Statistical significance is the first step to filter large amount of data such as those obtained by microarray analysis. However, biological significance is difficult to study for many reasons outlined below although that is the sole purpose of performing these analyses.
Depending on how one defines "controls" usually hundreds of genes are "up-" and "down-regulated" in any set of data. When comparing gene expression data from human patients, "controls" are often not clearly defined. Comparison is also made to "ref sequences". Currently, there is no consensus as to what constitutes appropriate controls for a given patient population. These variables contribute to the conclusion generally made that this particular disease is "associated" or "influenced" by the genes that came out of this particular analysis.
Another important variable is the platform used for microarray analysis. One caveat of considering blood samples drawn from people without the particular disease under study In the case of experimental situations, such as tumor cells treated with a drug, it is possible to narrow down the list of genes that are "highly" regulated for further analysis. Similarly, it is possible to get a fair idea of genes associated with a particular disease in preclinical models by comparing groups of animals not underwent the manipulation or infection and/or drug treatment.
There is also no consensus as to how to analyze animal samples. Although it is a general practice to pool RNA from a group of animals to minimize expression analysis bias, some analyze individual animals. It should be kept in mind that even though the inbred animals are genetically uniform, they do not respond to external cues similarly.
Finally, validation of microarray data needs to be analyzed by qRT-PCR to learn about the meaning of the results. This can be done using animal tissues, tissue culture cell lines including tumor cell lines. It may be difficult to validate results in human samples due to the heterogeneity of the human population.
The complication is that not all genes that are differentially regulated based on microarray data can be reproduced by qRT-PCR. It will be ideal to follow the protein expression but it may not be possible for all genes analyzed. Again, picking what gene or protein is important for the pathological situation under study is difficult and depends on the investigators' bias. It may be that a particular gene that came out of the analysis does not make sense. It is evident that any given pathological condition may not involve only a few genes but may involve a set of genes, some of which may have epistatic effects.
Considering all of these points is necessary to come up with strategies to better understand the "biological significance" of the genes revealed by microarray data.
Addition to Dr. Morales list:
Most bioinformatics tools consider expression of each gene in isolation ("list of up-regulated genes" most common way to present data) without regard of gene-gene (protein-protein) interactions. This is partly due to limitations of algorithms, partly because of limitations in computational power, partly because of lack of understanding of biology by people with computer science background.
From my personal experience of working with engineers for developing a mathematical model, I can suggest that both the concepts are mutually reinforcing. When statistically significant models get tested in the lab under the scope of biological relevance, the results should feedback the model to improve it. Similarly, statistical models should guide the experimental design and inclusion of proper controls. This is an iterative process and the number of iterations will reduce the gap between statistical significance and biological relevance. One crucial aspect in this exercise is proper communication among researchers in two different fields. Thats why I am a strong proponent of integrated systems biology when it comes to solve complex problems like cancer !!
Ppl always want make life easy with using handy statistical methods, without giving much importance to all underlying assumptions. That may be the seed of being misguided by stats. Like often for simplicity several tests been performed with genes considering each of them are independent. But in reality, it may not be true. understanding biology, and proposition or calibration of tests as per the data situation may help one to find true outcome as significant.
One example on expression data.... while we perform analysis with this quantitative data, basic approach is to do paired t test. But in t test there been no scope to deal with the information that 4 fold expression change is the cut off of a biological significance..... these are may be some issues one should keep in mind
Statistics is a number one tool in many fields of biology; lots of journals accept papers only if the data handling includes proper statistics. Many biologists calculate the mean values of the parallels and evaluate or compare these mean values. It is really useful and informative in certain fields of biology, however, one should consider that the target of the research can be to show the biological variation. A very simple tool can show this: you can use simply the AVEDEV function of Excel (average of the absolute deviations of data points of their mean function). We have used the AVEDEV function to compare fluorescence properties of seedlings: read more in J. Photochem. Photobiol. B. Biology 90: 88-94.
Bela
Perhaps the most important question is to ask whether or not the analytical method (the one used to generate the data) that was applied was meaningful and, as pointed out above, adequate controls or baseline data (e.g., cleaned data sets) were generated to yield useful and biologically meaningful observations. If not, neither bioinformatic or statistical tools will render a meaningful conclusions.
Second, one also needs to ask whether or not the appropriate statistical methods were applied to answer the questions that are posed by the experimentalist. More often than not, this requires the input of a statistician, not a bioinformatician. It is also important to remember that in confirmatory statistics, statistical significance is only meaningful when there are real hypotheses being tested.
We believe just the numbers. Depenging on the degree of values, we belive state of the matter. Not like math, biology and natural cience based causality, in which posibility of same repeatable experiments must give similar results or so-called similar (due to the presence of uncertainity). Why dont we discuss gravity? because we're really convinced with that!... I aggre with Max moldovan "there are only few biologically significant findings made it to practice",
Both statistical and biological significance are important in biology to establish a cause and result effect. But we cannot explain a biological phenomenon just on the strength of the statistics and that is when we go wrong as the biology is forgotten in the process. Actually, we tend to increase the probability of a biological phenomenon by using the statistical significance which is closer to the realty but the absolute realty and we should keep that in mind in which case the conflicts will be understandable and not serious.
The two should work together but are often challenging to unite. Jason C. Hsu at Ohio State University has been intrigued with this for some time. See: http://www.stat.osu.edu/~biostat/newsletters/volume1_1/article_vol1_1.html
In gene expression, I often rely on the p-value for reliability to make an inference on a gene or pathway and the fold change for confidence of what I see is "biologically" meaningful. Caution is required here as there are genes (i.e. transcription factors) that have small or no detectable expression change but have an enormous impact on overall gene regulation.
Also, the MicroArray Quality Control (MAQC-I) consortium has shown that there is reproducibility in microarray when both a p-value and fold change are taken into consideration. See:
Shi, L., Reid, L.H., Jones, W.D., Shippy, R., Warrington, J.A., Baker, S.C., Collins, P.J., de Longueville, F., Kawasaki, E.S., Lee, K.Y. et al. The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements. Nat Biotechnol, 24(9):1151-1161, 2006.
Shippy, R., Fulmer-Smentek, S., Jensen, R.V., Jones, W.D., Wolber, P.K., Johnson, C.D., Pine, P.S., Boysen, C., Guo, X., Chudin, E. et al. Using RNA sample titrations to assess microarray platform performance and normalization techniques. Nat Biotechnol, 24(9):1123-1131, 2006.
Tong, W., Lucas, A.B., Shippy, R., Fan, X., Fang, H., Hong, H., Orr, M.S., Chu, T.M., Guo, X., Collins, P.J. et al. Evaluation of external RNA controls for the assessment of microarray performance. Nat Biotechnol, 24(9):1132-1139, 2006.
I concur with most views expressed and especially the comment by Pierre Bushel on the relationship between statistical significance and biological significance. My view point is that whereas statistical significance is important and crucial for filtering large data sets including microarray and DNA sequence data. However, not all genes that are significantly 'up-regulated' and 'down-regulated' in microarray experiments are biologically meaningful. This is primarily because currently there is no way of assessing biological significance of most of the genes that we intend to study under the set of conditions that we use at the moment. Some of these genes may turn out not to cause direct effects on the cell types under investigation but exert indirect influence and so difficult to decipher. So their roles cannot be simply dismissed because of these limitations.
One caveat of statistical analysis of large sets of data is sometimes the failure to apply stringent measures such as FDR (false discovery rate) when analyzing microarray data, which leads to 'discovery' of false positives. If one bases biological significance on flawed statistical analysis, the end result would be very misleading and defeat the purpose of conducting such non-hypothesis driven experiments.
to Radoslav: can you explain your comments about the entropy and negentropy in more details?
Radoslav, et al.,
Thank you for expanding my small mind.
As a cell (not molecular) biologist I was aware of the basic theories of entropy and enthalpy, but have been ignorant of negentropy. You have enlightened me, for which I am grateful!
Perhaps this new insight may help me to interpret my RNA-SEQ analyses of patient tissues, in which we find 50% is 'other' (unannotated) RNA.
...And I thought that there wouldn't be anything left to discover when I grew up!!!
Great conversation! Thanks.
Profesor Rodoslav,
Thanks for your wisdom.
People have been posting in this blog without knowledge of the Popperian view. I can't wait for the day that Annotation may happen.
Sometimes a small difference in the mean values of two groups or of the same group before and after intervention comes out to be statistically significant due to low S.D in the data. But from biological aspect, especially in the medical field, the small change observed seems meaningless. For example, a certain drug reduced the blood pressure of the subjects by 1-2 mm with statistically significant difference.
On the contrary, a large difference in the means comes out to be non significant since the data shows large S.D. Many a times this can result due to abnormal response of one or two subjects. Therefore, I feel that either the statistical test or biological meaning both need to be taken with sufficient thinking and understanding of the underlying phenomena.
The answer to this question depends on a particular situation. Give me the specifics and I will attempt to opine in a timely and objective manner.
Statistical significance is a very useful parameter but it should not be stretched too much when the biological significance is not obvious. Sometimes exceptions make a rule in biology which should not be ignored while considering biological significance vs. statistical significance
In an independent way of the aforementioned answers, I'll suggest an idea abut the objective of the question where the statistical analysis will support information. If you have a good observation and after a good hypothesis statistic can help you, in terms of the probability that this event occurs. So this will enforce/reduce the probability of support your hypothesis. In This way the significance and non biological sense, seems (for a field ecology) to be related with run every model /analysis you can do with your data, lacking the rationalization needed in ecology trying to "ask" to the ecosystem the proper question, in order to disentangle the "true". It's very common to see many results that have been done just to explore...even when exploration is necessarily, hypothesis testing should follow a different approach.
I hope this help in the biological significance vs. statistical significance issue.
Given where this discussion thread has gone, I suspect that this really should span to a separate discussion, but:
A large part of the "conflict" between the statistical significance and biological relevance, is not because there is a conflict between these concepts, but because the statistical calculation is based on completely different assumptions than the biological reality.
A simple example of this is "sequence similarity" vs "biological function similarity". Sequence statistics say "these things are significantly similar", and biologists (typically) infer "these things are likely to function similarly" from this prediction. However, the statistical analysis is really evaluating the probability of common evolutionary descent, not the probability of similar function. Evolutionarily distant orthologs are less-similar than evolutionarily proximal paralogs. As a result, it _looks_like_ the statistical significance is at odds with the biological relevance, but in reality the issue is not that the statistics disagree, but that the statistics were never applicable to the interpretation in the first place.
Ask the right statistical question, and the results will be very different.
In his wonderful book, Statistics as Principled Arguement, David Adelson provides an acronym I use whenever I'm giving a stats talk.
MAGIC:
M for magnitude: how big is the effect?
A for articulation: can we explain how the effect occurs?
G for generality: will the effect be applicable to the people reading the article?
I for interesting: is this new or useful information to the reader?
C for credible: are the data and analysis reliable?
If you keep these elements in mind, you can do a statistical study that biologists or clinicians will understand and appreciate.
Philosophical discussions of what can be known are all very nice, but they completely miss the point that the biologist means something specific when he asks "why isn't this statistically significant result biologically relevant". And that his perception of the deficit is much more closely related to the fact that the statistics address a question that he's probably never even thought about, than to the fact that he doesn't completely understand what he means by relevant.
It's quite true that the typical biologist doesn't understand how models are built - and it's simultaneously quite fine that this is the case. It's equally true that modelers rarely understand what the biologist needs from the model - and this is a significant problem.
Very few people understand how the cars they drive are built, or how to build one themselves, yet most of them manage to drive cars reasonably well. In this case, the builders have taken the needs of the users into account, and built tools that address those needs. In the case of biological users, most modelers have abdicated responsibility for understanding the needs of the user, and instead build models of what they believe to be relevant, or of what they find to be interesting, and expect the user to take the responsibility of adapting their needs to the model.
This fundamental disconnect needs to be addressed, before one starts worrying about whether questions relate to object or instance, or how to address quantum uncertainty in biological modeling.
I concur with the opinion expressed by William Ray. It is difficult to acquire the ability to completely understand the bases of 'statistical significance' and 'biological significance'. Our training and abilities are highly limited. One spends his/her life time mastering certain aspects of science. Each side has its own reasons. Although it is hard, an interface must be built so that mathematicians and biologists can interact to come up with solutions. The current trend of drawing conclusions about biological relevance based on meta-data analysis as well as large genetic studies without biological validation raises concerns. On the biological side, there are many disturbing practices that prevent healthy progress. More work needs to be done on both sides.
Regarding to William Ray's opinion, my question is: who are the drivers in current molecular biology research?
Take the car building example, if we consider the biologists are the drivers, but they actually need to drive the car to the unknown places, e.g., the mood or Mars. They don't know how much their driving experience on the earth can be applied on the mood? In other words, so far, they don't know what is their needs clearly.
The builders (statisticians) have to take the place of guessing what they may need according to the limited information they have from the images or the data collected from the unknown world. (Building a model in two or three dimensions is very easy. It’s hard for people to understand what is happening in high dimensions biological data. It’s even been said that if people could see in high dimensions many mathematical models would not be necessary. Many statistical models work just like taking pictures from the problems.)
Perhaps various types of cars need to be build first; then, sent the biologists and the cars together to the ``Mars'' (what a evil plan! :)). If they can bring more information back from the ``Mars'' , then we may know which type of design is better, and how to improve the design. Before they come back with a better understanding of problems, using their current driving experience to validate the design is inappropriate.
This is getting a bit far afield of the issue and diverging into how one does proper algorithm/model development, but, a wise man once said:
If you want to have a computationalist improve the work of telephone operators, you can't have a telephone operator tell the computationalist what they need, because they'll just ask to have what they're already doing, done more efficiently or automatically. You can't ask the computationalist to watch the telephone operator and improve things, because they'll almost inevitably optimize portions of the process that aren't actually an impediment. The best way to approach the problem is to train the computationalist to _be_ a telephone operator and then have them actually do the job for a while. The areas where what they know on the computational side, can improve what they're doing on the telephony side, will be immediately obvious to them. Short of training the operators to be computationlists (which seems less likely to succeed), no other solution produces well-targeted, revolutionary improvements to the process.
I believe this applies equally well to the statistics/biology world. We don't need to send the biologists to Mars with a fleet of cars to test, we need to train the statisticians to be Mars-Biologists, and send them to Mars with a box of tools, and then watch what kind of cars they send back...
As I expressed earlier, it is a difficult to cross-train people at the peak of their careers to the extent that biologists can become good statisticians and vice versa. Currently, young statisticians do not have strong biology background and the biologists do not have strong mathematics background. Perhaps, training young people in both biology and bioinformatics may be beneficial. Till then, the experts in both sides have to find common ground.
Hi, guys,
See this recent interview from Professor Noam Chomsky at MIT
"Noam Chomsky on Where Artificial Intelligence Went Wrong"
http://www.theatlantic.com/technology/archive/2012/11/noam-chomsky-on-where-artificial-intelligence-went-wrong/261637/
Very good question. A statistical significance that is very much influenced by the sample size mainly refers to the effect size. One should rely more on the effect size to interpret the results and then on statistical significance.
Its very easy to understand statistical significance as compare to biological significance. To understand biological significance you need to do lot of research, you need to view the subject as it is part of a network. Since, we do not understand the network, so its become difficult the significance of the subject. But, that does not make we stop trying to understand the subject, because One small step for man, one giant leap for mankind. Its also very important to report all negative result, its negative to someone else and positive to someone else. I would say even if your result are just 1% significant, that does not mean its biological insignificant. It means it is significant to some researches and insignificant to other. We should try to report all our results.
I am intrigued by these discussions as they address one of the fundamental issues in environmental and medical biomarker "discovery" research. Dimitri's comments go to the heart of the problem - that of sample size vs. the number of measured variables. One can get statistically significant associations between some variables (caffeine consumption, tennis shoe size, stock market index) and and some outcome (lung cancer, asthma attacks, birth weight, etc.) that may mean nothing biologically, often because they are random or voodoo correlations. If you have enough "independent" variables, at p=0.05, 1/20 will be a false positive.
In my opinion, the statistical significance must always be tempered with biological relevance. For example, if you have an unexpected result from a complex discovery experiment that caffeine consumption is correlated to lung cancer incidence, then you have to look at the biological links between caffeine and cancer (e.g. DNA adducts), and also the statistical links between caffeine and known causes of cancer (e.g. smoking). Hopefully there will be an explanation that makes both statistical and biological sense that can then be further explored.
Ah, I think there are two different issues that are being conflated in discussion into the same topic here...
I believe the original poster was asking about statistically significant results (uncontrovertibly statistically significant - not frequentist p-value silliness), that nonetheless are biologically irrelevant.
There is a difference between false positives in a test of something that's biologically relevant, and true positives in a test of something that's biologically irrelevant. The former is, while frustrating, at least easy to understand conceptually, and fixes for it are, again at least conceptually, easy to explain. The latter is considerably more problematic, and in a general sense, fixes will require a lot of insight that does not seem to currently exist.
I think the problems are 2 fold - one is that with huge amounts of data things become statistically significant with even small absolute changes that are of little biological relevance - so 51 % of cancer x has increased expression of this gene compared to 49% of cancer y and with enough samples this becomes statistically significant but it is practically irrelevant as it couldn't be used to differentiate or predict anything. Another problem is that the data going into the model may be garbage. For instance this gene expression data is from the whole pool of cancers cells in a sample - but perhaps only a small subset of the cells drive the growth or resistance of the cancer and the noise from the rest of the pool drowns out those biologically relevant cells?
One has to look at the biological significance irrespective of whether the change is small or big or statistically significant. When one deals with large number of samples statistics definitely helps.
Radoslov - in my little corner of the world, I deal with empirical observations/data. Often, we observe treatment effect (before/after exposure to chemicals) or case-control at the molecular level in blood/breath/urine samples in the hopes of deducing relevant pre-clinical markers of an eventual adverse outcome pathway. The big questions we face are if changes in the human exposome are adverse, adaptive or irrelevant.
Beginning to understand the actual biology behind observed perturbations is key to assessing the toxicity of subtle exposures to man-made or other chemical stressors in the environment.
Application of statistics for biological relevance is though is controversial, one should not ignore that it is the statistics that give us conformation of what we have hypothesized. The problem arises when it has been overtly interpreted with commercial interests
There is pontential naivity in stressing the importance of one ignoring the other. I would be very skeptical of embracing all statistical significant results as there are a lot of assumptions that are made when applying variouse stastical methods and models. However, when due considerations of all this is made there is normally a good correlation between the statistical validlity of the results in relation to the biological contexualisation of the underlying question, thus when correct statistical methods are applied. The question posed lacks proper structure, as one may get statistically significant results by using not " the best statistical method" these will consequently lack biological significancy. On the hand how can you tell that the patterns that you are observing in your biological data are not just a random occurance if you dont quantify them in terms probabilistic statistical inference. In a nut shell there is a need to understand the proper use of statistical models and methods and how you link this to biological validity.
Any result without Biological relevance has less meaning and is relegated to supplemental material, or a short paragraph for possible future support. Likewise any result without statistical support cannot be generalized, whether it has biological relevance or not. This is not an "OR" situation, but rather an "AND" situation. Results worth reporting are both statistically significant and biologically relevant, those that do not meet both criteria should be presented as speculation.
It is sometimes very difficult to go for statistical significance though the biological relevance may be there. If one looks at many of the results reported by medical professionals they may not have many samples to give a statistical significance but their observation on one patient or a few with biologically relevant reults are thouroughly discussed and ar found to be very useful in diagnosis and also treatment.