I am trying to make a paired matrix of gene-gene correlation. Considering that I have a huge matrix (13000 genes and 900 samples) and for some reasons I don't want to decrease the number of my genes, my gene-correlation matrix would be 13000*13000 and my paired matrix will become 169 million *4 (Column 1: Gene 1; Column 2 : Gene 2; Column 3: Correlations; Column 4: P-values) . In this case, I have to exclude unnecessary calculations as much as I can. I have excluded the situation that Gene 1 = Gene 2. But I couldn't find a way to exclude the condition that "Column 1: Gene 1 ; Column 2: Gene 2 = Column 1: Gene 2; Column 2: Gene 1 ". To make a long story short, correlation between G1 and G2 is equal to G2 and G1. It is like calculating just lower section of diag in a symmetric matrix. I would be grateful if anybody help me in this case. I have enclosed my python codes in picture and context for your convenience:
import pandas as pd
import numpy as np
import scipy
import math
import openpyxl
from openpyxl import Workbook
from scipy.stats import spearmanr
.
.
din=pd.read_csv('m_test.csv', index_col=0)
.
out=pd.DataFrame()
outdf=pd.DataFrame()
.
for g1 in din.index:
for g2 in din.index:
temp=din.loc[[g1, g2]]
if g1==g2:
next
else:
spR, spP=spearmanr(temp.loc[g1], temp.loc[g2])
frame={'g1':[g1], 'g2':[g2], 'spR':[spR], 'spP':[spP]}
out=pd.DataFrame(frame)
outdf=pd.concat([outdf, out])