I have a 47gb file to parse. The sequences are in the following format:
>tscs_00041 gene0ea_12345_rframe2_orf
mlaathyykfairrlfpllkdticasysisikhhenfmalsnmpkiwedvevdgnnmqwtrfqttpvmpvyfiaagvfnlsfitnwntkllyrkdilpymtfaynvakniawflshirktkitnhi
>tscs_00044 gene0ea_12341_rframe2_orf
mticasysisikhhenfmaikhhenfmalsnmpkiwedv
I simply want to format this file like:
>tscs_00041
mlaathyykfairrlfpllkdticasysisikhhenfmalsnmpkiwedvevdgnnmqwtrfqttpvmpvyfiaagvfnlsfitnwntkllyrkdilpymtfaynvakniawflshirktkitnhi
>tscs_00044
mticasysisikhhenfmaikhhenfmalsnmpkiwedv
Can anyone share the script?