Which parameters (based on E values, score, identity or similarity) should we select to demarcate that the sequence we generated in our experiments is new or is present already in any dataset (protein or nucleotide)?
If the idea is to only see ..... 'whether a sequence is present or not in the data base'.....it can be even simply judged even by variation at a single base/amino acid site.....
E value in a database similarity search study represents the 'probability/likelihood of a particular sequence present in the database just by chance per se'....Lower the E value lower the chance and more significant the results...In my opinion ...novelty of a sequence cannot be judged by E-Value.....and all novel/new sequences may or maynot have multiple variations.....a single variation should also enable a sequence to be listed as new sequence.....
Furthermore...exactly same sequence present in a database maybe submitted in two different organisms....so they are identifcal...(as may be seen in the blast results or by alignment) but from different organisms ....
If you can send me such a paper I will also go through it....they may be discussing the E value and Novelty in a different context..I guess so...
For bacteria, a single SNP can reflect a significant change in the protein sequence that affects function. I would argue that if your sequence is not identical to what is found in Genbank, then it warrants inclusion in the database. If your sequence is identical to a Genbank entry but found in a different organism, this would also warrant inclusion. Most journals require that any novel SNPs identified in your study should be deposited in the appropriate database.
You can run a BLAST and see if there are any similarity with other sequence, in the results you could see if your sequence already exists or is novel, you can judge if is novel if the similarities that you found are not from the same organism, or if many nucleotides or aminoacids are different (in blast you can see the results with different color, each color show you the aminoacid or nucleotids similarities).
BLAST search after edman sequencing (N-terminal protein sequence) will assure you whether your sequence is already existing in the database or it is a novel sequence. It can be a novel sequence from your source. Also significant values indicate matched sequences in Mascot after peptide mass fingerprinting.