Does anyone know if there is an established convention or rule on the minimum percent identity cutoff for nucleotide sequence in order to infer sequence homology?
I am looking for pseudogenes and have found regions that show ~45% identity to the exons of my gene. Not to mention that they are in the correct order and syntenic region…
But presumably, if we put in enough spaces, we can align any two sequences... Is there minimum identity value where one can be confident that they are looking at truly homologous regions?
Any advice/opinion would be greatly appreciated!