I am trying to identify proteins from soil using mass spectrometry with a soil protein database. Are there any good pipelines out there that can take protein fasta files, assemble/join them all together in one large file, remove duplicate sequences and then add a unique identifier to each protein sequence. I am also currently trying to write a code for this in python .