I have a KEGG database Brite hierarchy file, in which the data is present in following form;

test-file

C 0001 Carbon [C]

D SAR001 methane [CH3]

D SAR002 ethane

D SAR003 propane

D SAR004 butane

D SAR005 pentane

C 0002 Hydrogen [H]

C 0003 Nitrogen [N]

C 0004 Oxygen [O]

D SAR011 ozone

D SAR012 super oxide

C 0005 Sulphur [S]

D SAR013 Hydrogen Sulphide [H2S]

D SAR014 Sulphuric acid

.

.

.

Lines starting with C are main headings, while those with D are its components. You can see that there is no component mentioned for "C 0002 Hydrogen [H]" and "C 0003 Nitrogen [N]". So I want to remove those lines (starting with C) which do not have any line below starting with D.

Desired output:

C 0001 Carbon [C]

D SAR001 methane [CH3]

D SAR002 ethane

D SAR003 propane

D SAR004 butane

D SAR005 pentane

C 0004 Oxygen [O]

D SAR011 ozone

D SAR012 super oxide

C 0005 Sulphur [S]

D SAR013 Hydrogen Sulphide [H2S]

D SAR014 Sulphuric acid

.

.

.

A single database file contains thousands of lines, and I have hundreds of such files. I Need a Perl or Linux based script to solve this issue.

More Muhammad Sufian's questions See All
Similar questions and discussions