Can I exclude RNA-Seq samples based on low intragroup correlation?

10 May 2023 1 1K Report

I have RNA-Seq samples from an experiment in which I infected epithelial cells with bacteria. The experiment was repeated identically 3 times and then once more with slightly altered parameters (higher bacterial inoculum and longer incubation time). Each experiment also had certain treatment groups. My goal is to determine DEGs between those groups.

When comparing the gene expression of all samples within their groups (intragroup correlation), some of them are clearly different from the rest of the group. Sometimes they also form two distinct clusters (see examples attached, blue color in the scale bar indicates low difference/high correlation). However, these "outlier" samples do NOT necessarily stem from the experiment with altered parameters.

How should I proceed with my data analysis?

Should I simply exlcude all samples from that 4th experiment because it did not follow the exact same protocol?

Should I determine a minimum correlation below which I exclude samples regardless of which experiment they stem from? If so, is there a specific methodology for this?

Should I not exclude any samples?

My concern is that I will miss otherwise significantly regulated genes because the samples within each group are so heterogenous.

Ma'Mon Abu Hammad

Yes, you can exclude RNA-Seq samples based on low intragroup correlation, but it is important to consider the underlying causes of the low correlation and the potential impact on downstream analysis.

Intragroup correlation is a measure of the similarity between samples within a group, and it can be used to assess the quality of RNA-Seq data. Low intragroup correlation can be caused by technical issues such as batch effects, library preparation artifacts, or sequencing errors, as well as biological factors such as genetic variation, sample heterogeneity, or differences in tissue composition.

If the low intragroup correlation is caused by technical issues, excluding the affected samples can improve the quality and reliability of downstream analysis. However, if the low correlation is caused by biological factors, excluding samples may introduce bias and affect the interpretation of the results. For example, if the low correlation is caused by genetic variation, excluding samples may result in a biased representation of the population or disease.

Therefore, before excluding RNA-Seq samples based on low intragroup correlation, it is important to investigate the underlying causes and consider the potential impact on downstream analysis. This can involve visualizing the data using multidimensional scaling (MDS) or principal component analysis (PCA) to identify batch effects or other sources of variation, as well as performing differential gene expression analysis to assess the impact on the results.

Overall, while low intragroup correlation can be an indicator of poor quality RNA-Seq data, it is important to carefully evaluate the causes and potential impact on downstream analysis before excluding samples.

Badges
Science topic

Similar topics
Statistics
Survey

More Sven Cleeves's questions See All

Do you think can be any Uranium bearing rocks in Eastern part of Iran and western part of Afghanistan?

I want to know more about Uranium ore deposits in world.

11 August 2024 6,720 0 View

Do you think can be any diamond bearing rocks in Eastern part of Iran and western part of Afghanistan?

I want to know more about diamond ore deposits in world.

11 August 2024 2,167 1 View

What is the difference between mathematical R^4 space and physical 4D unit space?

We assume that the difference is huge and that it is not possible to compare the two spaces. The R^4 mathematical space considers time as an external controller and the space itself is immobile in...

10 August 2024 6,678 14 View

If Banks do not provide credit facility, what are the options available for FPOs and impact on producer’s income?

10 August 2024 8,198 5 View

Controlling for pupil light reflex when analyzing pupil size time course?

I used eye tracking to examine how participants from two different populations (A and B) react to an image. Participants in population A exhibit larger pupil sizes over time, but they also have...

10 August 2024 3,229 0 View

What are a “Farmers Producer Organization” (FPO) and its essential features?

10 August 2024 477 5 View

Strugglling with m6A dot blot any suugesstion ?

I have been doing the m6A dot blot for a while with no improvement, I am extracting the RNA, and I can see the dots although the three biological replicas give a different reading on the memberan...

10 August 2024 8,539 5 View

Do interactions between biosphere, carbon cycle, & water cycle impact global warming & interaction between atmosphere & hydrosphere?

How do interactions between the biosphere, the carbon cycle, and the water cycle impact global warming and interaction between the atmosphere and the hydrosphere?

09 August 2024 3,291 2 View

How to get moment output in Abaqus Standart?

I have input a moment load in module load Abaqus, i put my moment load on the node surface (using reference point). I have define moment in history output and make a set for moment too. But the...

08 August 2024 4,831 4 View

How is energy cycled through the Earth's climate system and how do matter cycle and energy flow through the rock cycle?

08 August 2024 8,162 0 View

Is skin yellowness an numerical or ordinal variable?

I have a response variable called skin yellowness, which I will measure via a scored color chart, whereby 1 is pale yellow and 15 is orange. I'm not sure if this counts as an ordinal variable,...

11 August 2024 4,793 1 View

How to learn more about SPSS and its Application?

I would like to learn more about SPSS and Its application especially in regards to data analysis. Please suggest me how I can learn more about it. Thank you so much.

11 August 2024 9,101 4 View

Can I base on reverse DNA sequences to perform alignment, convert to amino acids and GenBank submission?

I have reverse sequences (AB1 format), can I base on reverse DNA sequences to perform nucleotide alignment, convert nucleotides to amino acids and deposit the sequence in GenBank database?

11 August 2024 5,138 1 View

Baseline drift in HPLC? What causes this?

Hello, Why do i see this baseline drift when i compare my blank (black) to the sample (blue)? Any suggestions as to why this happened? Thank you!

11 August 2024 3,770 4 View

Text-Communication from the M1 Hand Area using BCI—and then there is Elon Musk?

Willett, Shenoy et al. (2021) have developed a brain computer interface (BCI) that used neural signal collected from the hand area of the motor cortex (area M1) of a paralyzed patient. The...

10 August 2024 7,180 0 View

Handling Missing Data and Building a Predictive Model with Incomplete Information ?

I am developing a predictive model for a water supply network that involves 20 influencing points. However, I only have historical data for 10 out of these 20 points. I would like to know how to...

10 August 2024 4,005 2 View

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

I'm currently exploring the application of Python in textile engineering, specifically in areas like data analysis, process automation, and the development of smart textiles. I'm interested in...

10 August 2024 7,429 2 View

How can I use the cif data obtained from rietveld refinement extracted via gsas2, for microstructural analysis using ETEX software?

09 August 2024 7,718 0 View

How are iso-frequency contours plotted?

Let's say we have a standard, regular hexagonal honeycomb with a 3-arm primitive unit cell (something like the figure attached; the figure is only representative and not drawn to scale). The...

07 August 2024 1,937 1 View

How to prepare the nanoparticle treated fungal sample for Environmental SEM analysis?

A fungal strain was treated with nanoparticles. We want to do an environmental SEM analysis. So could anyone share your views on preparing the sample? Thank you.

07 August 2024 5,307 1 View