I sequenced new isolates of S. singaporensis collected from different islands here in the Philippines. A well known scientists who is also working on this told me that don't rely on the sequences at NCBI. I'm just confused.
That question cannot be answered in general. The databases at the NCBI/DDBJ/EMBL will definitely contain errors as the data comes from various sources and most of the databases are only marginally curated. But that holds true for all big databases without manual curation (and even those are not flawless). So if a submitter entered, e.g., the wrong species or sequence, that of course can cause problems. But at least for genomes, the current requirements (BioProject, BioSample, etc.) should help to minimize those. Of course that does not help with assembly errors due to mistakes made by the assembler or the scientists involved.
Depending on your application(s), it will be difficult to avoid these databases altogether . So the standard caveat for all sources of external data applies, regardless if you get them from a program or a database: Check carefully what you get/use.
That question cannot be answered in general. The databases at the NCBI/DDBJ/EMBL will definitely contain errors as the data comes from various sources and most of the databases are only marginally curated. But that holds true for all big databases without manual curation (and even those are not flawless). So if a submitter entered, e.g., the wrong species or sequence, that of course can cause problems. But at least for genomes, the current requirements (BioProject, BioSample, etc.) should help to minimize those. Of course that does not help with assembly errors due to mistakes made by the assembler or the scientists involved.
Depending on your application(s), it will be difficult to avoid these databases altogether . So the standard caveat for all sources of external data applies, regardless if you get them from a program or a database: Check carefully what you get/use.
Christian is correct. The databases are in large part uncurated - while some metadata will be screened and corrected (e.g. keywords etc.), the sequences itself are definitely not - validation would require far too much time/effort/expenditure. Thus the databases rely on the submitter's information - and subsequent experimental validation if and when it happens.
There is a chance that genbank submissions may have problems with regards to the experimental approaches being used, and/or minor quality check prior to the submission. If your worry is due to the contaminants that can be found in sequence repositories such as Sequence Read Archive (SRA), there are certain tools available to carry out an initial Quality Control. It is always ideal not to trust the dataset, but scrutinize it first with the tools that can be found in the literature. In fact, for sequence submissions, NCBI has a stringent quality check, but it still may not guarantee the reliability of the data. Currently, I am working on a tool to quantify the contamination found in transcriptome datasets, and will hopefully be releasing it soon, so that many researchers can utilize it.