When performing transcriptome analysis, how genes related to our trait of interest are selected? Is it just on the basis of GO and KEGG or what? Please suggest.
I work on plants, GeneCards is basically meant for human. STRINGs give information about interactions. For say I need to study salinity stress? How can I select genes related to salinity from my transcriptomic data
You will need to start with some basic functional annotation on your list of genes, as in for ALL the genes, not only the ones you're interested in. Has your plant of interest been sequenced? If so, it is very common for people publishing a genome to add functional annotation to the genes annotated in the genome. For example, you could have Pfam domains or other conserved protein domain algorithm ran on them. Interpro is a really good tool for it, it assigns both conserved protein domains and one can also add GO terms, which you can use to search for enriched terms in your dataset.
I see you're looking for a particular trait - salinity. I would start with looking at previous work were salinity related proteins were identified. If you're lucky they would have a particular conserved protein domain or some sort of characteristic. Then, you can use this common characteristic/s to look for other genes in the transcriptome of your plant genome that code for proteins containing the same trait, most likely a conserved protein domain.
Most of the transcriptomic studies generate a long list of differentially expressed genes. A number of different approaches have been used to identify “biomarkers” or “candidate genes” and to filter out genes that are differentially expressed but not related to the phenotype under study. The “genetical genomics” approach, is one of the approach based on examining the genetic variations associated with the given phenotype along with the whole genome profiling of mRNA expression, has been used to identify “candidate genes” for a given phenotype – see Drake et al., 2006, Mamm. Genome, 17:466; Ganguly et al., 2007, Physiol. Genomics, 31:410; Tabakoff et al., 2008, Mamm Genome, 19:352; Farris et al., 2010, Intl Rev Neurobiol., 91:95). Though all of these articles talk about data from animal models, a similar concept can also been used in plant transcriptomics.
You can initially make your selections based on your GO, KEGG, or network analysis (e.g., IPA) analyses. You may also like to pick the genes that show the most significant changes. This process is somewhat a manual process as nothing can replace digging into the literature and databases to see what have been known about the genes on your list. You may also want to follow on the genes you select by confirming their expression pattern with another method such as real-time PCR. Usually not all the genes can be confirmed, which would shorten your list further.