I want to predict protein-protein interaction using a machine learning classifier algorithm. I have a dataset of interacting and non-interacting protein sequences in fasta format, where there are 5k+ interacting and non-interacting protein pairs. How do I combine these two interacting and non-interacting protein sequences in numeric values so that I can train my classifier model?

Similar questions and discussions