I have three heuristics H1, H2 and H3 that have been tested on a set of instances. For each triple (n,m,p) we have 100 instances. The performance criteria that have been used to compare between them are the average deviation from the best lower bound (Dev_H) and the number of times a given heuristic yields to the best solution (Best_H), as shown in the given table.
My question is: how to use p-value to see if the differences between H1 and H2 are significant and to show that H3 outperforms H1 and H2 for n=200 and H1 and H2 outperform H3 for n=10