I used ChIP-Seq data to map DNA binding sites of a protein of interest at the genomic scale, is it mandatory to control the quality of sequencing even if there are referential data in NCBI?
To transform it into sanger (fastq) with galaxy tools
If you want to get the fastq files and do a novel alignment, then you'll have to check with fastqc, as if it were a new sample, to be sure that raw data do not contain any sequencing primer / adapter or that base quality does not decrease to much at the end of reads. When I use published data (usually from GEO), I always start from fastq files and treat them like a new experiment, to be sure that analysis is comparable with my own data.
Yes, it is very important to check all published data as if they are your own and raw.
Especially if you intend to use it is data in your papers.
There are so many reasons to do this not limited to: sequencing platform, chip antibody used, chip antibody lot number, single-end, paired-end, qualtiy of fastq reads,
library contamination with spike-in or empty reads etc.
We have come across many situations where the raw data associated with published work bears very little resemblance to the published work when we analyse it properly.
Yes, you should check the quality of the data to avoid any kind of biasness generated during the sequencing as well as to reduce the artefacts it's an important step. To check the quality several online tools are available, the widely used is FASTQC. The graphical output helps to compare the reads.
NCBI, GEO etc. is not controlling for the ChIPseq quality. What is currently considered as a "good quality" ChIPseq is also changing every year, also depended on what was ChIPed.
FastQC is the minimum. GCbias, and plotFingerprint (DeepTools2) is what you should look for in order to see the amplification bias during library-prep and the fingerprint is giving you an idea of the ChIP efficiency/specificity :). Also a plotCorrelation of all samples is helpful to detect swapped samples or "outlyers".
Il faut toujour controler les données brutes car beaucoup de résultats dans les publications ne correspond pas vraiment au données déposé dans GEO. soit par maladresse ou intentionnelement. Je ne généralise pas mais il faut faire attention.
Il faut utiliser FASTQC qui propose une multitude de controle.
Bonjour Atika, je me permets de confirmer la réponse de Mohamed, car souvent les reads peuvent encore contenir des adaptateurs et sont souvent trimmés par les auteurs en aval lors de leurs analyses, mais cela n'est pas nécessairement indiqué dans les soumissions GEO. La qualité peut également être inhérente à la manip, par exemple dans le cas du bisulfite sequencing (il faut trimmer au minimum les 10 premières et dernières pb sur des reads de 100pb par exemple). La soumission GEOi requiert en effet en général les reads bruts, c.a.d. en sortie des pipelines d'Illumina/SOLiD etc pour assigner les bases, et donc non trimmés!
Je vous conseille FASTQC, même si je ne sais pas s'il existe une version en ligne; cependant c'est un algo écrit en Java et donc assez simple à utiliser et il existe une version avec interface graphique: