Thanks Kevin, can you please send me some weblinks about R? I have little knowledge about bioinformatics. Can beginners like me use R, or we need some advanced knowledge? Is there any online training or instructions to use it? Thanks once again.
Hi, R is good, but requires knowledge of Perl. Someone told another program cluster 3.0 (http://bonsai.hgc.jp/~mdehoon/software/cluster/software.htm) and then using Java Treeview. For either program, do we need copy number and TPM values for differentially regulated genes of all replicates? I have an excel sheet with p-values and the combined value of a gene of all three replicates of WT and the mutant. Where will I get the TPM and copy number from, if they are required. Please let me know, thanks!
Follow the steps given in Trinity Program. It uses edgeR package after generating FPKM values to feed into it.
Programing skill is less than required using this package. They have scripts written just to be used. Otherwise you can use heatmap.2 function in R as suggested by Kevin ,though it require normalized read count values i.e. TPM or FPKM values.
If you already have a list of diferentially expressed genes, then you must also have their normalized expression values, either in FPKM/RPKM or raw counts, You can use those expression values to generate a heatmap with dendogram or maybe do a PCA analysis.
Its quite strange that people here haven't heard about the R package pheatmap, it stands for pretty heatmap. It is a brilliant tool designed for biologists who may not like to work on command line too much, it allows you to create "pretty" heatmaps (almost of ggplot quality) without too much programming hassle!! Since I first found it, it has been my favorite for drawing heatmaps, and its much better than heatmap.2. Use pheatmap on Rstudio, and it wont require as much programming capabilities.
Few days ago another interesting tool called ClustVis was published. Its a webserver where you can simply upload your data and get the work done, but I'm not sure if analysis/visualiztion of 12k genes would be supported since its a webserver!
N.B.: Because you are generating heatmaps, be careful with the default value of scaling as it depends on normalization (and how it was done). Use the scaling (row or column wise) or no scaling option accordingly.
"I have RNA-seq data of rice with some 12000 differentially expressed genes."
Not really!
If this is really the case, then I would doubt that the normalization makes sense (or that it worked). And even if you could convince me that this all is ok (and it is a really diffcult problem to normalize expression profiles when there are large differences!), even then I would still wonder what the biological interpretation will be when the majority of the genes is not only expressed (what is either strange or indicates that you analyze a large pool of different cells*) but also differentially expressed. These are completely different profiles. So it seems you are either comparing apples with peaches or you are digging in the noise (i.e. [most of] the observed differential expression is actually noise).
Regarding your practical problem: I strongly suggest to cooperate with a local statistician/bioinformatician (experienced in such analyses). There are many pitfalls and difficulties, and it is damn easy to completely misinterpret clusterings and heatmaps. So your benefit is not only to less likely come a cropper but also to really learn a lot.
---
* In the latter case a sensible interpretation of expression patterns is not possible and there remain serious doubts that the cell composition is constant. If this is not constant then you will not be able to dissect which part of the expression changes are due to changed physiological processes and which are due to changes in the cell composition
Thanks for your analysis, even I wondered why there are so many genes differentially expressed. I believe, it could be due to different cell types/tissues owing to severe developmental disorder. Please see the attached image of my rice plant, with the wild type on the left and 3 mutants (used as 3 biological replicates on the right). They are affected severely, developmentally and physiologically. These are miRNA mutants and their target genes are expectedly up in the RNA-seq data.
The data analysis was done by a professional bioinformatician from the sequencing facility, and they took three criteria for differentially expressed transcripts:
the analysis and interpretation of such data is no easy task. I can pinpoint problems or pitfalls, but I do not have the competency, expertise, time and resources to find out how to analyze your data. Sorry. This is hard work and takes considerable time of several scientists who are "experts" in your field (and the cooperation with a bioinformatician is also highly suggested).
I may be late in answering, but still I agree what Jochen Wilhelm suggested. You must filter the list of deferentially expressed genes based on suitable FDR and log2fold change cutoff to eliminate any spurious DEGs. This will shorten your list, following you may try making heatmap using R package, or Tm4-MEV, or cluster program.
Assuming you only have two conditions with 3 replicates each, one way to do it is to retrieve the FPKMs for those differentially regulated genes (from cuffnorm, as cuffdiff only gives the mean of replicates), compute Z-scores and plot using heatmap.2
Since you already have differentially expressed genes data, I assume that the data is a matrix of samples in columns, genes in rows and values are the expressions.
You can import the matrix to BioVinci to do the clustering. This visualization platform supports clustered heatmap and dendrograms.
Just select the Hierarchical clustering function, then drag your matrix into the placeholder and click run. Then the heatmap will be shown like this. You can adjust the color scheme, turn on/off the clustering trees, and change the axes by simply clicking on the elements on the plot.
However, their Web Edition restricts the file size. I think it would be better to run the software on your computer: https://vinci.bioturing.com/download-app
You can draw heat map of some important genes . I think no need considering all genes. You can try by using R code (https://www.biostars.org/p/205417/)
I found MeV Tm4 best software for generating heatmaps. It is freely available java based software initially developed for microarray data analysis. You can also use genesis, treeview, cluster3.0 for this purpose for offline studies.
You can use our browser-based software BIOMEX, designed to facilitate the Biological Interpretation Of Multi-omics EXperiments by bench scientists. BIOMEX integrates state-of-the-art statistical tools and field-tested algorithms into a flexible but well-defined workflow that accommodates metabolomics, transcriptomics, proteomics, mass cytometry and single cell data from different platforms and organisms. BIOMEX guides the user through omics-tailored analyses, such as data pretreatment and normalization, dimensionality reduction, differential and enrichment analysis, pathway mapping, clustering, marker analysis, trajectory inference, meta-analysis and others.
OneStopRNAseq: A Web Application for Comprehensive and Efficient Analyses of RNA-Seq Data (https://mccb.umassmed.edu/OneStopRNAseq/)
Users only need to select the desired analyses and genome build, and provide a Gene Expression Omnibus (GEO) accession number or Dropbox links to sequence files, alignment files, gene-expression-count tables, or rank files with the corresponding metadata. OneStopRNAseq pipeline facilitates the comprehensive and efficient analysis of private and public RNA-seq data.