How to infer unknown gene functions from trajectories of gene expression time series plots?

10 October 2017 0 5K Report

How can I use similarities between the trajectories of gene expression time series plots to infer the same similarities between gene functions as if I had used TFBS distribution information or gene expression distance matrices derived from them? How are similarities between mRNA gene expression pattern properly determined? What is the best way to predict or infer still unknown gene functions based on similarities or correlation measures between time series plot trajectories?

TFBS stands for Transcription Factor Binding Sites, which are promoter-specific for each gene. A promoter is controlling gene expression. Gene expression measures the mRNA concentration at each time point. The mRNA changes over time can be defined or inferred by two methods, which convey the same information by using 2 different dimensions:

1.) The Time Series Plot Trajectories use the Y value for each time point to infer how much mRNA was present at this time

2.) The TFBS determine the level of transcription because it is regulated by the Transcription Factors (TFs), which bind to the TFBSs.

So actually the TFBS are the cause and the trajectory of the time series plots is the consequence. But both refer to the same event but the describe and quantify it by completely different means / dimensions.

TFBS distributions must be defined by the nucleotide sequence (A, T, C, G), to which a particular TF (mostly a protein) binds and spacial indicates telling distances expressed or measured in the number of nucleotides separating them. But some TFBS overlap one another. The exact TFBS distribution is probably still subject of debate for many promoters.

Some extremely smart genomic scientists must have agreed on a method to rank the similarities of gene expression patterns (i.e. transcriptional similarities) by converting the TFBS distributions between promoters in distance matrices following a complex algorithm I cannot understand. However, it allows the put the genes in an order that their ranking reflects the similarities of the impact that the different promoter-specific TFBS distribution of on transcription. I refer to this order as the true similarity between genes.

Now since the same information is encoded in the trajectory of the time series plots of transcription levels I was trying to find a way to rank the trajectories of the time series plots in such a way that the relative similarity order between genes is the same as if I had used TFBS distribution information.

If I had succeeded I could have used the trajectories of time series plots, which is much easier to use, to predict the same kind of similarities between genes than if I had used TFBS distribution information or distance matrices derived from TFBS information.

There was an R package, which claimed to be able to achieve this, but despite trying for long time I could not get it to work. Are you good in R and Python? These are the only two programming languages I know.

The task my adviser gave me to figure out as part of my dissertation was to predict the function of genes, which we don't know yet, using similarities between the trajectories of time series plots based on the assumption that - the more similar the gene expression pattern between genes, i.e. the higher the correlation between gene expression patterns - the more similar are the functions of those genes.

My problem is that there are so many different mathematical methods to determine, compare and rank the similarities and to calculate the correlations between the trajectories of time series plot each of which yielding a different relative similarities between genes. For a long time I was trying to figure out the best way to determine, which way of similarity calculation would be best until I realized that there is no right or wrong, or better or worse way to calculate the similarities / correlations between time series plot trajectories. I realized that the relative order of similarity rankings between genes must be the same as if these genes were ranked based on their true relative similarities, which I am supposed from trajectories of time series plots. Since TFBS distributions are the most direct measure of expression patterns, whichever sorting method puts my time series plots into the same relative order, which I would have gotten if I had based my similarity sorting on TFBS information.

This is how far I was able to follow the train of thought of my adviser. What I could not figure out is how to actually determine the similarities between temporal gene expression pattern because again there are many ways. One can use Pearson Correlation, Time Wrap, phase shift, periodicity parameter and many other parameter by which certain types of trajectory properties can be described, compared and ranked. Now the problem is that the expression pattern of some genes is considered periodic whereas others are not. So if I used the period length, suddenly I'd have N/A values for all the genes that lack a periodic expression pattern. That is where I got stuck and why I could not include any of this in my dissertation. Nobody could explain me how to measure, compare and rank time series plot trajectories.

Do you have any idea? Does this task even make sense? This is how far I understand it but I cannot figure out how to solve it in such a way that people would be happy with my solution, Please help if you can.

Please somebody help fast with detailed explanation because if I can figure this out within the coming week i can still include it in my dissertation. In case I can I have a realistic chance to graduate this fall semester. I must either graduate or die because my GA funding did not get extended into this academic year causing me to starve because I have no income and don't know where to go.

Badges
Science topic

More Thomas Hahn's questions See All

What is the best way to explain human decisions and behaviors?

What is the best way to explain human decisions and behaviors? Are human decisions, behaviors, responses and actions always the consequences of people making free choices? Why or why not?

09 October 2019 2,909 1 View

Could you please share links to cancer datasets and to Python or R packages to analyze them?

Hi As a blind bioinformatician to get a job I must be proficient in analyzing cancer datasets. Could you therefore, please share with me links to cancer datasets and to R or Python packages to...

06 July 2018 6,933 0 View

What is the compensatory power of remotely training visually impaired computer users??

What is the compensatory power of remotely training visually impaired by providing them with sufficiently many - each other reinforcing - visual experiences to extend their limited visual field by...

04 May 2018 4,863 3 View

Who would be interested in me leading workshops about innovative adaptations to make electronic information more accessible to the visually impaired?

Abstract: What if we could without any additional cost 1. Attend lectures from home? 2. Follow them directly on our screen? 3. Make them accessible from anywhere with assistive technologies 4....

03 April 2018 2,010 6 View

What is the big deal about the medical disclaimer on supplements that everyone, except for me, seems to feel obligated to respect this rule?

In America, any supplements must be accompanied by the disclaimer that they are not intended to diagnose or treat any diseases. But why? Let's face it! In reality we are trying to accomplish what...

03 April 2018 8,302 9 View

Can the Immigration Status be adjusted as means of last resort to give foreign disabled job-seekers a chance to get hired for less competitive jobs?

Can I request that my immigration status gets adjusted so that it will no longer keep less-competitive jobs away from me as it has done for the past 25 years? The main barrier, which has prevented...

03 April 2018 6,676 0 View

What are the great still undiscovered benefits of standardizing the functional layout and display the same functions at all websites?

What are the great still undiscovered benefits of standardizing the functional layout and display the same functions at all websites? What must happen so that people like me can work more...

03 April 2018 2,723 0 View

What would make cancer the most fascinating disorder if it were not deadly?

For me cancer is an exciting fascinating, but simultaneously fear-inducing phenomenon, similar to playing with fire, because it is nature’s proof that immortality is not only possible, but...

03 April 2018 4,283 11 View

Are my concepts, based on which I intend to infer gene functions, correct?

Please look at attached file for figures I cannot include in this text editor. My concept based on which I intend to predict the function of genes, which are not yet known seems to work most...

03 April 2018 6,729 2 View

How can feature selection for training Supervised Machine Learning Algorithms be expanded to improve their predictive power (60th revision)?

Title: ##### How can supervised machine learning (ML) algorithms be enhanced in their effectiveness to improve their predictive power by requiring proper feature selection; thus, advancing our...

03 April 2018 6,459 0 View

How to learn more about SPSS and its Application?

I would like to learn more about SPSS and Its application especially in regards to data analysis. Please suggest me how I can learn more about it. Thank you so much.

11 August 2024 9,101 4 View

Can I base on reverse DNA sequences to perform alignment, convert to amino acids and GenBank submission?

I have reverse sequences (AB1 format), can I base on reverse DNA sequences to perform nucleotide alignment, convert nucleotides to amino acids and deposit the sequence in GenBank database?

11 August 2024 5,138 1 View

Baseline drift in HPLC? What causes this?

Hello, Why do i see this baseline drift when i compare my blank (black) to the sample (blue)? Any suggestions as to why this happened? Thank you!

11 August 2024 3,770 4 View

Text-Communication from the M1 Hand Area using BCI—and then there is Elon Musk?

Willett, Shenoy et al. (2021) have developed a brain computer interface (BCI) that used neural signal collected from the hand area of the motor cortex (area M1) of a paralyzed patient. The...

10 August 2024 7,180 0 View

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

I'm currently exploring the application of Python in textile engineering, specifically in areas like data analysis, process automation, and the development of smart textiles. I'm interested in...

10 August 2024 7,429 2 View

How can I use the cif data obtained from rietveld refinement extracted via gsas2, for microstructural analysis using ETEX software?

09 August 2024 7,718 0 View

How are iso-frequency contours plotted?

Let's say we have a standard, regular hexagonal honeycomb with a 3-arm primitive unit cell (something like the figure attached; the figure is only representative and not drawn to scale). The...

07 August 2024 1,937 1 View

How to prepare the nanoparticle treated fungal sample for Environmental SEM analysis?

A fungal strain was treated with nanoparticles. We want to do an environmental SEM analysis. So could anyone share your views on preparing the sample? Thank you.

07 August 2024 5,307 1 View

How to normalize and take the significance of the MTT OD values with 3 replicates for the same cell-line?

Hi, I have a question about normalizing the MTT OD values for doing the statistical analysis. So, if we have 3 different plates and we call them 3 different replicates, so, first we would...

07 August 2024 8,106 4 View

Why does my protein refolded to beta sheet during thermal denaturation analysis?

Hi! So i attempted to understand a novel protein behavior towards heat application by analyzing its secondary structure change. I subjected the protein to a thermal denaturation analysis using...

06 August 2024 1,989 3 View