I have a KEGG database Brite hierarchy file, in which the data is present in following form;
test-file
C 0001 Carbon [C]
D SAR001 methane [CH3]
D SAR002 ethane
D SAR003 propane
D SAR004 butane
D SAR005 pentane
C 0002 Hydrogen [H]
C 0003 Nitrogen [N]
C 0004 Oxygen [O]
D SAR011 ozone
D SAR012 super oxide
C 0005 Sulphur [S]
D SAR013 Hydrogen Sulphide [H2S]
D SAR014 Sulphuric acid
.
.
.
Lines starting with C are main headings, while those with D are its components. You can see that there is no component mentioned for "C 0002 Hydrogen [H]" and "C 0003 Nitrogen [N]". So I want to remove those lines (starting with C) which do not have any line below starting with D.
Desired output:
C 0001 Carbon [C]
D SAR001 methane [CH3]
D SAR002 ethane
D SAR003 propane
D SAR004 butane
D SAR005 pentane
C 0004 Oxygen [O]
D SAR011 ozone
D SAR012 super oxide
C 0005 Sulphur [S]
D SAR013 Hydrogen Sulphide [H2S]
D SAR014 Sulphuric acid
.
.
.
A single database file contains thousands of lines, and I have hundreds of such files. I Need a Perl or Linux based script to solve this issue.