We know that the four native bases for DNA are AGTC, however, some of the sequences, retrieved from NCBI, contain letter 'N', which illustrates that these nucleotide bases are not deciphered correctly, leaving an unidentified nucleotide. Should I replace N with any other base i.e. AGTC, assuming N can be any nucleotide, or I should exclude such sequences assuming that the sequencing done was not of good quality. If none of these, what I can do with such sequences in my dataset?
P.S I can't find any help in Entrez Sequences Help Catalog.