We have generated transcriptomes of various plant tissues. For validation of gene expression, we need to select correct transcript through screening of paralogs, how can we resolve this problem. Can anyone help me for this?
If I understand your question correctly, and based on some of the information you gave in subsequent answers, specifically "generated NGS data", "transciptomic data", and "FPKM/RPKM", I am assuming you are talking about RNAseq data. Trascriptomic datasets like RNAseq, gene expression arrays, etc. are based on RNA abundance. You can determine relative abundance of transcripts, whether some genes are more abundantly expressed than others, but since RNA expression is tissue/cell specific and regulated by many cellular processes you cannot reliably infer anything about copy number from transcriptome data. Genes may be up or down regulated by means other than copy number.
Are you performing de novo assembly or reference based alignment? How did you generate your reads i.e. which system, length, paired end or single end etc.? Which species are you working on? What ploidy level? The strategy will all depend on the answers to these questions.
We have generated NGS data through illumina paired end sequencing followed by De novo assembly. I am working on Picrorhiza kurroa herb which has diploid ploidy level.
How could I use this information to shortlist correct paralogs for each gene?
I'm afraid I don't really have much experience with de novo transcriptome assembly. Initially I would filter the reads fairly stringently to remove errors as much as possible. Something like solexaQA trimming to PHRED 30 with minimum read length retained at 70-100 bp depending on original read length. I would then try to run the assembly as conservatively as possible i.e. biased toward collapsing contigs rather than retaining. I would then try running something like BUSCO (http://busco.ezlab.org). BUSCO estimates completeness of the transcriptome assembly based on presence of 'universal' eukaryotic single copy genes. If BUSCO indicates your assembly is fairly complete and known single copy genes are only present in a single copy then you can have some confidence proceeding.
From there you will have to consider the gene family or families you are most interested in and assess whether the copy number is reasonable based on copy number in related species (if this information is available?). You could also attempt SNP calling on your SAM/BAM files. If you have high confidence SNPs forming set patterns in your contigs, this could provide evidence for multiple paralogs collapsed in your contigs.
Is Picorhiza kurroa mostly selfing or was the line you sequenced self-polinated for several generations? If not, you might have to find a way to correct for heterozygosity or you may overestimate copy number. Also if the sequencing was performed on a pooled population of segregating lines it will make this analysis very difficult.
If I understand your question correctly, and based on some of the information you gave in subsequent answers, specifically "generated NGS data", "transciptomic data", and "FPKM/RPKM", I am assuming you are talking about RNAseq data. Trascriptomic datasets like RNAseq, gene expression arrays, etc. are based on RNA abundance. You can determine relative abundance of transcripts, whether some genes are more abundantly expressed than others, but since RNA expression is tissue/cell specific and regulated by many cellular processes you cannot reliably infer anything about copy number from transcriptome data. Genes may be up or down regulated by means other than copy number.
I have read the suggested article and not clearly identified the ploidy level of Picrorhiza kurroa. Instead, I have referred the article"Reproductive Biology of Picrorhiza kurroa - a Critically Endangered High Value Temperate Medicinal Plant". This article clearly suggested the diploid ploidy level of Picrorhiza kurroa. Moreover, I have also find diploid ploidy level in the link mentioned below:
I have already gone through these articles, but the "RBG Kew Plant DNA C-values database" shows that Picrorhiza kurroa is tetrapolid with 2n=34. So these studies suggest that this species is either diploid/ tetraplod or both ploidy levels are present. At least for me its confusing.