PAM50 subtyping based on RNA-Seq encountered nested gene?

I would not worry too much, and kick out MIA-RAB4B from your annotation (GTF/GFF used by HTSeq or other counting software - just remove the entries for it with text editor).Then you will get pretty normal counts for MIA.

MIA-RAB4B seems to be in fact sort of a fusion product of MIA and RAB4B, so MIA is not in fact "nested". This part of genome is just crowded with interesting transcriptional phenomena and poor curators do not have a polite way of annotating them :)

Just in case you may have a look in IGV (eg in sashimi plot) if there are not too many reads on the junction between the exon 4 and 5 ie. on the fusion connection of MIA-RAB4B, compared to junctions in "normal" MIA). This would suggest high representation of MIA-RAB4B compared to MIA, and then MIA counts would be not that reliable in the signature.

Karol Szafranski

What exactly do you mean by "subtyping" of the PAM50 gene? Since you describe RNA-seq read mapping and counts, you obviously quantify expression. Is this your primary goal? Or do you plan to search for sequence variation (mutations)?

Michal Okoniewski

Praveen Kumar Raj Kumar

First of all thanks a lot Karol and Michal for responding.

Karol,

In breast cancer research 'tumours' are categorized into four "Subtypes" based on 50 gene expression. Those 50 well defined genes are called PAM50. One of those is MIA (Melanoma Inhibitor activity). My first goal would be to get quantification for all 50 genes and classify tumour RNA-Seq into breast cancer subtype. Downstream we might be interested to find variations and so on. Hence I wanted to hear different perspectives before I do something about MIA conundrum.

Michal,

Getting rid of MIA-RAB4B was my first impression too. Through viewing of the MIA alignments in IGV I learned that most reads are splice aligned across MIA exons. I don't see many read alignments splice across MIA and RAB4B (please find attached). Moreover, MIA-RAB4B is annotated as NMD candidate.

Since MIA-RAB4B is annotated in a reference genome I think it is not called fusion gene. Fusion genes are used to denote fused genes in cancer cells. Here is an excerpt I found from the Broad Website. Moreover fusion proteins are known to be active.

Gene fusion is exactly what it sounds like: a part of one gene combines with a segment of another. Found in the genomes of cancer cells, this abnormal genetic rearrangement wherein two genes become one fused gene contributes to the growth and spread of tumor cells. This is because the abnormal proteins made by fusion genes appear to be much more active than the normal versions.

Ref: https://www.broadinstitute.org/blog/word-day-fusion-gene

Michal Okoniewski

yup, the expression of MIA-RAB4B and RAB4B is rather negligible compared to MIA - at least in this case. And yes - this is not a classic "fusion product" resulting eg from cancer shuffled karyotype, just a sort of "possible extension" of genes on a normal genome. Good luck in classifying the suptypes! :)

Karol Szafranski

Praveen, thank you for the clear explanation!

Fully agree to Michal. Get rid of MIA-RAB4B, since it would chronically mask MIA using HTSeq.

For specific quantification of MIA, RAB4B AND MIA-RAB4B, you might need to switch the gene caller or base your mapping procedure on a transcript reference (assuming, you can define sequence specific for MIA-RAB4B). This would be worth if the fusion gene has a particular relevance for subtyping (which appears reasonable).

Can anyone point me to refseq73 human GTF?

Is there a problem with my RNA pellet?

Which Scopus Journal provides the most affordable fees?

Seeking Advice on Viability and Execution of Undergraduate Thesis Topic?

Strugglling with m6A dot blot any suugesstion ?

Who will be moral responsible for the death of thousands of people in the event of an earthquake?

RNA Extraction Using Hot Borate Method No Longer Working?

Does Anyone have expertise in in vitro transcription and RNA pull down assay?

Are there any instruments for studying time similar to the way it is in space?

E.coli contamination in human RNA seq data ?

RNA later for the preservation of RNA in fecal samples at room temperature for one day (37°C)?