What do the numbers mean in these RNA-Seq gene/transcript TPM files?

More John Kim's questions See All

Why I can't see any band in SDS-PAGE?

Currently, when I run SDS-PAGE, I don't see any bands at all, even though I used the same material just a day ago and it worked fine.... In our lab, we dilute the 10X running buffer to 1X and...

06 August 2024 5,373 2 View

Question about water vapor uptake of metal-organic frameworks?

I'm working on the atmospheric water harvesting, and i synthesized MOF by adding some materials into precursor and changing the conditions. (i followed Hydrothermal method) Actually, if MOF is...

24 July 2024 1,515 0 View

Is this a scam?

UK's Well-online Science Press sent an invitation to be a co-author and offers a payment of $30 per review.

22 July 2024 1,695 8 View

Technically: Do proctoring software add to quality assurance in assessment in blended learning modules?

One of the challenges in ensuring quality and integrity in blended learning modules is cheating especially when it comes to assessment (Tests, exams, and assignments). Now one of the solutions...

21 July 2024 6,741 0 View

When you express a protein, why do we express not only the domain we want, but also the protein around it?

I want to express STK4, and I've searched the paper for reference. When I check the protein kinase domain sequence for that kinase on Uniprot, it's 30-281, but the paper expresses the protein...

20 July 2024 4,951 1 View

Did Einstein misunderstand the photo-electric effect?

Photo-electric effect A light particle moving towards a surface cannot (by collision) force an electron particle to move away from that surface. A light wave can instead make interference with a...

19 July 2024 8,389 9 View

The reason for breeding Loxp mice with FLPe mice?

I am planning to use CD4 Cre mice crossed with Loxp mice. However, the laboratory from which I am trying to get the loxp sperm says they have "the product of the EMMA ko crossed with the FLPe to...

16 July 2024 7,475 1 View

Can we find the Gromacs version from XTC file or TRR?

Hello everyone, This soudns like a strange question, but I created five XTC files using five different versions of GROMACS some years ago. All other files are missing, and I want to repeat the...

16 July 2024 7,205 2 View

Why is serum-free media used when making cancer cell conditioned media?

I am preparing cancer cell conditioned media to study the effects of substances secreted by cancer cells on other cells. In some papers, conditioned media is prepared using FBS-containing media...

11 July 2024 7,063 1 View

Stable cell line generation: GOI is detected by qPCR but protein is not expressed?

Hello, I'm generating stable cell lines by transfecting plasmid DNA with lipofectamine 3000. The cells went through antibiotics selection for 3 weeks (polyclonal selection) and when I analyzed...

11 July 2024 4,812 5 View

How to learn more about SPSS and its Application?

I would like to learn more about SPSS and Its application especially in regards to data analysis. Please suggest me how I can learn more about it. Thank you so much.

11 August 2024 9,101 4 View

Is there a problem with my RNA pellet?

Hello, I am currently having problems with RNA extraction. I am using mouse liver (C57BL6J), and I have extracted RNA from mouse liver before. Before this experiment, my final RNA pellets were...

11 August 2024 7,082 3 View

Can I base on reverse DNA sequences to perform alignment, convert to amino acids and GenBank submission?

I have reverse sequences (AB1 format), can I base on reverse DNA sequences to perform nucleotide alignment, convert nucleotides to amino acids and deposit the sequence in GenBank database?

11 August 2024 5,138 1 View

Baseline drift in HPLC? What causes this?

Hello, Why do i see this baseline drift when i compare my blank (black) to the sample (blue)? Any suggestions as to why this happened? Thank you!

11 August 2024 3,770 4 View

Text-Communication from the M1 Hand Area using BCI—and then there is Elon Musk?

Willett, Shenoy et al. (2021) have developed a brain computer interface (BCI) that used neural signal collected from the hand area of the motor cortex (area M1) of a paralyzed patient. The...

10 August 2024 7,180 0 View

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

I'm currently exploring the application of Python in textile engineering, specifically in areas like data analysis, process automation, and the development of smart textiles. I'm interested in...

10 August 2024 7,429 2 View

Strugglling with m6A dot blot any suugesstion ?

I have been doing the m6A dot blot for a while with no improvement, I am extracting the RNA, and I can see the dots although the three biological replicas give a different reading on the memberan...

10 August 2024 8,539 5 View

How can I use the cif data obtained from rietveld refinement extracted via gsas2, for microstructural analysis using ETEX software?

09 August 2024 7,718 0 View

Request Python code?

Request Python code from this article : Gender equity of authorship in pulmonary medicine over the past decade. THANKS!

08 August 2024 6,242 2 View

RNA Extraction Using Hot Borate Method No Longer Working?

I've been performing RNA extraction on cotton petiole tissue for a few months now using the method described in the following paper, a derivative of the typical hot borate method...

08 August 2024 9,882 2 View

Jason W Hoskins

I'm assuming you're looking at a file that ends with "...v7_RNASeQCv1.1.8_gene_tpm.gct", which has the gene or transcript-level TPM values. Note that the file contains 2 rows prior to the actual header with the sample IDs, which is probably why R wasn't reading it into a data.frame appropriately. If you're using read.table() function to read the file into R, you can use the "skip" argument to skip the first 2 lines.

As for your main question, the numbers filling most of the file are the TPM values, which are the normalized expression measurements for the various genes/transcripts in the given samples. TPM stands for "Transcripts Per kilobase Million" and it gives a measure of expression that normalizes for both gene/transcript length and overall read depth (i.e., total reads from the sequencing), in that specific order. TPM has emerged as a preferred normalized metric because TPM values have a consistent meaning across both samples and genes. Here is a short explainer video on TPM vs other common normalized metrics: https://youtu.be/TTUrtCY2k-w

I'm not sure what letters you mean when you say "The amount of letters far exceed the amount of samples." But let me briefly note that there are more samples than subjects in the dataset because individual subjects typically have samples from several different organs/tissue types.

Austin Herbert

These files are gene expression matrices. They contain samples as columns, rows as gene names, and gene expression values as TPM.

These values you see like such: "GTEX-1117F-0226-SM-5GZZ7" are sample identifiers. There are also meta data files, like so "GTEx_Analysis_v8_Annotations_SampleAttributesDS.txt." which contain rows as those sample identifiers and columns contain meta data about each sample like tissue type, nucleic acid and library prep QC/QA info, and then sequencing QC metrics like read length, depth, and coverage.

Depro Das

GTEX-1117F-0226-SM-5GZZ7 is the sample ID and the ENSG00000223972.4 refers to the gene symbol according to the HUGO gene nomenclature. The numbers you are referring to are gene expression values. TPM (Transcripts Per Million) is a normalization method that has been used to scale these gene expression values so that it is possible to make the expression of genes comparable between samples.