Conflicting results in the literature or in one's own work are a frustration; which software do you find most helpful and why? Have you found a program which takes into account cellular or anatomic heterogeneity?
I'm glad you mentionned the visuals. Some of the programs produce "hairy furball" or "starry sky" visuals, but when you look closely at them, they do not reproduce the actual interactions: for example, the GABA receptor is a heteropentamer, so the alpha, beta and gamma subunits directly physically interact, yet the visuals have them far flung in far corners of the diagram or plot. Do you know any program that overcomes these limitations?
I understand the frustration when it comes to analysis of high throughput data. I would suggest to use DAVID bioinformatics tool which is bit more spp. specific.
Thank you Muhammad, I will look into it. Do you have experience with any other programs (good, bad or indifferent), and do you find that special circumstances favor one program over another? Also, there seems to be a problem with the pathways staying up to date to deal with the flood of new literature. Some of the curated pathways have quite old references; there is of course a tradeoff between accuracy but perhaps incompleteness via manual expert curation and timeliness/up-to-dateness but perhaps inaccuracy via other methods?
The problem of accuracy of the pathways as well as their up-to-date status is a big issue. Even when database are uptaded according to literature (like IPA), you should always take into account that probably the "canonical pathway" valid for most of the cells may not be right for your setting. Same is true with other database related analysis, such as gene ontology (in DAVID) and funtions (in IPA). I usually take them as an indication and then I study single pathway/gene ontology family more in details to decide whether the annotation can be applied also in my setting. As it has been said above by others, IPA allows you to set the cell types, but I noticed that sometimes most of the genes are filtered out.
Regarding which software to use, IPA has a great graphic and allows you to more easily navigate gene funtions/interactions. However, I do prefer DAVID for most of the analysis because I prefer its statistics. There are some features of IPA I didn't like in the past (i.e., definition of background, multiple counts for replicate entries), I don't know if latest versions got them fixed. Graphically, IPA is way much better. While DAVID (that is free) is probably less user friendly.
Thank you, Luciano. I have used IPA a bit, and I like the annotations. I appreciate your comparisons.
However, a problem with all the pathway analyses is that they are somewhat static and simplistic. To use examples from my area of interest (granted, the brain is more diverse and complex than other organs):
A given brain region - say the motor cortex - may use one set of GABA receptor subunits at one stage in development, and switch to another set later in development, so the diagrams and pathways look quite different depending on the age. I have not found any pathway software that takes that very crucial phenomenon into account. Also there is a huge amount of cellular heterogeneity even, say, among GABAergic neurons which although they all synthesize GABA, use different transynaptic adhesion molecules to bind to their targets, depending where they are localized in the brain.
If you know of any pathways analysis in your discipline that take in to account the developmental and regional specialization of cells, please let me know! BTW, I did my postdoc in B lymphocyte differentiation years ago, and the field has come a long way since then....
Wow Martine,that looks pretty specific! I'm sorry but I don't know any software that goes so deep. I can only suggest you to patiently build your own set of pathways in IPA. When I deal with similar situations (i.e., pathways that don't exist in the database) I usually select the genes I'm interested and check their fold change/statistical significance, but there can always be some disagreement on the way I selected those genes. For this reason I usually prefer to refer to a review or one or more publications.
Good luck and let us know if you find such a powerful pathway analysis tool!
I agree with Luciano. If you need to be specific for you analysis then IPA allows you to do it efficiently. I would suggest to see what are the upstream regulators being activated or inhibited in the pathway you are interested in. Then gradually build the relationship between the molecules via build function in IPA, it will either find a shortest direct or indirect relationship between the two molecules you want to connect; however, do not forget to see if it fits to your biological context. Other way is to pick the molecules one by one and see them as network with other molecules after overlying the gene expression from your data set. In this way you will be able to include or exclude the genes based on the relationship with the pathway. Best of luck and keep playing with IPA.
I am very biased as I work on it, but I like Reactome.
you can take a look at Reactome (http://reactome.org), a free, open-source, curated and peer reviewed pathway database. You will find intuitive bioinformatics tools for the visualization, interpretation and analysis of pathway knowledge to support basic research, genome analysis, modeling, systems biology and education.
Thank you - I very much like Reactome because it is clear and superbly well-documented.
The challenge that all the pathway databases face is the avalanche/tsunami of information coming from all directions, with no sign of abating.
Reactome does an excellent job in describing canonical pathways (such as GABA synthesis) but biologically, there is a surprising degree of heterogeneity among (for example) neurons of the GABAergic class. In some, the vesicular GAD2 pathway is very prominent, in others the cytosolic GAD1 pathway is more prominent. Ditto for neurons expressing GABA receptors, many permutations of receptor subunits, each having their own trafficking, exocytosis and endocytosis pathway.
The other crucial factor is that the expression of all these receptors changes, depending on the developmental age of the mammal. At least when it comes to Hemoglobin, there is only 1 big switch at one time: fetal to adult Hemoglobin, but when it comes to neurotransmitters, there are multiple switches in multiple regions, in keeping with what we know of the complexity of the mammalian brain. The other aspect is that neurotransmitters do not work in isolation; there is cross-talk. For example, the metabotropic receptors (such as the GABA metabotropic receptor complex) are frequently expressed in different neuronal types, where the effects on the output are completely different, in some cases opposite.
The visual and symbolic representation of this information will be very complex.
Each has unique features and none can be relied on completely (though they're getting better all the time). I don't think there is too much shame in querying multiple databases. I endorse Reactome, as it has shown me things that others did not, but have not figure out a way to customize background gene list (see below). DAVID is convenient for querying many databases at once and offers protein structure/function and more general biological/cellular processes -- DAVID also lets you customize you background gene list (e.g. microarray specific) to give more accurate p-values. . Ingeunity Pathway Analysis has pretty detailed annotation for upstream regulators, but they are proprietary $$, no real customized backgrounds. .
A recent, sobering assessment of the difficulties in 'omics integration is offered by Tieri and Nardini in Mol. Biosystems 2013, 9, 2401
Results in pathway queries (eg number of pathways retrieved) may vary depending on the type of identifier used ie gene name, UniProt ID, Entrez Gene ID. There is a variable degree of coherence within a database (this was surprising), and, not surprisingly, between databases. There was even variability within a database as to the ranking of a pathway, depending on which of the above identifiers were used. Take a close look at their Tables 1 and 2. The authors make excellent recommendations for the scientific community (this will require $$), for database curators, and for the end user- for the latter, caveat emptor.
We are still quite some distance from clinical grade pathway analysis in general, and the situation is magnified in the neurosciences. For example, there is only partial agreement amongst the various databases of synaptic genes/proteins, as to which proteins belong, and where they belong eg pre- or post-synapse. This is crucial for not just pathway analysis but also for intelligent drug design.
A good description of the inconsistencies amongst various pathway analyses in neuropsychiatric disease appears in Sullivan and Posthuma
in Current Opinion in Behavioral Sciences 2015, 2:58–68. In this article, they justifiably recommend the carefully curated development of gene lists, with traceable author statements. In addition, I would suggest that conflicts and inconsistencies need to be reconciled, as important differences may arise depending on the animal species (even strains within a species), developmental age, gender etc. In addition, there are marked differences amongst brain regions and circuits, and these are rarely highlighted.
Even amongst studies and gene lists of the synapse, as the authors correctly point out, there are errors of omission and commission. This is a key issue, to which resources need to be urgently allocated, in order to avoid perpetrating errors which may skew not just the basic understanding of biochemical processes, but their clinical application in diagnostics and therapeutics. Time for a consensus conference and international collaboration!. If not promptly addressed, the situation will only worsen as the deluge of data keeps pouring in, and resources which are already thinly stretched, will reach a breaking point.