Does this question mean: Is it important to correlate between labels in multi-label classification?
If so, I believe that we shall consider it in different field of algorithms respectively.
In supervised learning, when labels are attached beforehand, the problem of algorithm-design comes to the encoding scheme, and related modelling. For example, a one-in-N code scheme in naive linear classification, or a -N1/N, N2/N scheme in a two-class Fisher classifier, as discussed well in PRML. It would be not necessary to further mine correlation since they are taken for granted.
However, in unsupervised learning, an increasingly important task may be automatically label-detection and further classification. This could be done by a hashing-like way, with the similarity between input vectors in shallow layers of neural network(namely RNN) is considered, an examplary instance is that of distributed representation of natural language. In this field, it is non-trivial to consider the correlation between labels, which indicates the distance or similarity of mapped inputs. The correlation between classes may give birth to a better classification standard.
Analytical cases may be represented by PCA. Which extracts feature by computing mean and covariance of an assumed normal distribution. PCA extracts features which are born orthogonal to each other. I wonder if that is a type of deconjugation in correlation analysis?
i tend to formalize ML classification through the canonical analysis framework
(i admit this is a somewhat larger perspective but i believe it allows to have a simple picture)
in canonical analysis, you have individuals (say areas of land) described by two sets of features (say, the animals living in the area on the one hand, and the plants growing in the area on the other hand)
the aim of canonical analysis is to find linear factors in each description space so that factors are maximally correlated across description spaces ; for the land area example, this means finding (mutually orthogonal) combinations of animal population (linear factor in animal description space) strongly correlated to (mutually orthogonal) combinations of plant population (linear factor in animal description space)
if you imagine the correlation matrix of the full description of the areas, it has a block structure
A ; B
tB ; C
with A the correlation matrix of the animal description, C the correlation matrix of the plant description and B the cross correlation matrix of the animal-plant descriptors
while PCA uses all this correlation matrix, canonical analysis concentrates only on the B submatrix (indeed, canonical analysis technically boils down to a svd of B)
.
all this handwaving introduction to indicate that there are two kinds of correlations when you have two description spaces (which is the case in multi-label problems : you have a feature space and a label space) : correlations within each description space (which are not the objective of the analysis : you are not interested in correlations between labels stricly speaking if such correlations have nothing to do with your feature space) and correlations across description spaces (which are the objective of your analysis)
for instance, some labels might well be just correlated noise from the perspective of your description space : PCA would take such correlation into account but without any gain from the classification perspective ; canonical analysis will ignore such correlations and concentrate on cross-correlations between your feature space and your label space