I have ~17k amino acid sequences in FASTA format in a single file. Using following command of Clustal Omega on Linux system, I created the distance matrix;

clustalo -i filename.faa --distmat-out=filename.faa.mat --full

It created a matrix of 17k fields x 17k records. The matrix is displayed in following format;

A B C D E ....

A 0.000 0.136 0.227 0.476 0.864

B 0.136 0.000 0.318 0.571 0.864

C 0.227 0.318 0.000 0.238 0.773

D 0.476 0.571 0.238 0.000 0.857

E 0.864 0.864 0.773 0.857 0.000

.

.

Manipulating this type of data is difficult for me as many values are repeating. I want the distance matrix to be drawn in following format;

A B 0.136

A C 0.227

A D 0.476

A E 0.864

B C 0.318

B D 0.571

B E 0.864

C D 0.238

C E 0.773

D E 0.857

.

.

Dealing with this data containing info of 17k sequences will be relatively easy for me.

Can anybody help me how to convert the format of distance matrix.

More Muhammad Sufian's questions See All
Similar questions and discussions