Which method to calculate the gene-gene correlation matrix?

Correlation AnalyzeR: functional predictions from gene co-expression correlations

Henry E. Miller &
Alexander J. R. Bishop

BMC Bioinformatics volume 22, Article number: 206 (2021) Cite this article

2563 Accesses
1 Citations
2 Altmetric
Metrics details

Abstract Co-expression correlations provide the ability to predict gene functionality within specific biological contexts, such as different tissue and disease conditions. However, current gene co-expression databases generally do not consider biological context. In addition, these tools often implement a limited range of unsophisticated analysis approaches, diminishing their utility for exploring gene functionality and gene relationships. Furthermore, they typically do not provide the summary visualizations necessary to communicate these results, posing a significant barrier to their utilization by biologists without computational skills.

Background Almost two decades after the completion of the Human Genome Project, the functionality of many genes remains largely enigmatic [1]. Many such “enigmatic genes” have immense biological significance, exemplified by the associations of thousands with cancer outcome [2]. Even genes which are well-characterized often play unexpected roles in different biological contexts (e.g., EZH2 is both a tumor-suppressor and an oncogene in different cancers [3]). Gene co-expression correlations provide a robust methodology for predicting gene function, as genes which share a biological process are often co-regulated [4,5,6]. Similar insights can be gained from using protein interaction (for example STRING [7] and InterologFinder [8]), phenome data, or even the combination of both [9]. Irrespective, generating expression data remains a cost-effective approach and co-expression analysis remains a prominent tool for exploratory systemic evaluation, largely because it is capable of considering gene co-expression across the genome. However, the applications which have been developed for such inference are hampered by key limitations. Tools like COXPRESdb [10] and GeneFriends [11] calculate gene set over-representation on an arbitrary number of co-expressed genes. Alternatively, GeneMANIA [12] and GIANT [13] construct co-expression networks and calculate gene set over-representation on an arbitrary number of nodes. Neither approach is sensitive to the genome-wide distribution of co-expression correlations or, with the exception of GIANT, differences between tissue/disease conditions. Furthermore, these functional predictions are limited in scope and do not generate relevant, user-friendly visualizations, limiting their utility for biologists without bioinformatics skills.Recently, Lachmann et al. introduced ARCHS4, a database with thousands of standardized RNA-Seq datasets [14]. We re-processed these data, calculating co-expression correlations with respect to tissue and disease (cancer/normal) condition and provided the results in a publicly accessible database. We now present Correlation AnalyzeR, a user-friendly interface to this co-expression database with a suite of tools for de-novo prediction of gene function, gene–gene relationships, and biologically relevant gene subgroups to facilitate discovery of novel relationships within genes of interest.

How to learn more about SPSS and its Application?

Can I base on reverse DNA sequences to perform alignment, convert to amino acids and GenBank submission?

Baseline drift in HPLC? What causes this?

Text-Communication from the M1 Hand Area using BCI—and then there is Elon Musk?

Handling Missing Data and Building a Predictive Model with Incomplete Information ?

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

How can I use the cif data obtained from rietveld refinement extracted via gsas2, for microstructural analysis using ETEX software?

How to define an anisotropic material with asymmetric elastic compliance/stiffness matrix in ANSYS APDL?

Is it true that $\det(V(A))$ may be only $\pm 1$, depending on $n$, for the last symmetric tridiagonal matrix $A$?

How are iso-frequency contours plotted?