In WEKA's Explorer, the performance statistics are calculated by micro-averaging across test sets. In the Explorer, the values in the row labeled "Weighted Avg.” are calculated as a weighted arithmetic average of the per-class results in the corresponding columns (i.e., as a weighted average of the micro averaged per-class results). The weight of each class in this average is given by the prevalence of the class in the pooled test sets. However, the Experimenter generates the micro and macro averages.
If you think all the labels are more or less equally sized (have roughly the same number of instances), use any.
If you think there are labels with more instances than others and if you want to bias your metric towards the mostpopulated ones, use micromedia.
If you think there are labels with more instances than others and if you want to bias your metric toward the least populated ones (or at least you don't want to bias toward the most populated ones), use macromedia.
If the micromedia result is significantly lower than the macromedia one, it means that you have some gross misclassification in the most populated labels, whereas your smaller labels are probably correctly classified. If the macromedia result is significantly lower than the micromedia one, it means your smaller labels are poorly classified, whereas your larger ones are probably correctly classified.
If you're not sure what to do, carry on with the comparisons on both micro- and macroaverage :)