Is it important correlation between labels in multi-label classification? why?

Does this question mean: Is it important to correlate between labels in multi-label classification?

If so, I believe that we shall consider it in different field of algorithms respectively.

In supervised learning, when labels are attached beforehand, the problem of algorithm-design comes to the encoding scheme, and related modelling. For example, a one-in-N code scheme in naive linear classification, or a -N1/N, N2/N scheme in a two-class Fisher classifier, as discussed well in PRML. It would be not necessary to further mine correlation since they are taken for granted.

However, in unsupervised learning, an increasingly important task may be automatically label-detection and further classification. This could be done by a hashing-like way, with the similarity between input vectors in shallow layers of neural network(namely RNN) is considered, an examplary instance is that of distributed representation of natural language. In this field, it is non-trivial to consider the correlation between labels, which indicates the distance or similarity of mapped inputs. The correlation between classes may give birth to a better classification standard.

Analytical cases may be represented by PCA. Which extracts feature by computing mean and covariance of an assumed normal distribution. PCA extracts features which are born orthogonal to each other. I wonder if that is a type of deconjugation in correlation analysis?

Fabrice Clerot

i tend to formalize ML classification through the canonical analysis framework

(i admit this is a somewhat larger perspective but i believe it allows to have a simple picture)

in canonical analysis, you have individuals (say areas of land) described by two sets of features (say, the animals living in the area on the one hand, and the plants growing in the area on the other hand)

the aim of canonical analysis is to find linear factors in each description space so that factors are maximally correlated across description spaces ; for the land area example, this means finding (mutually orthogonal) combinations of animal population (linear factor in animal description space) strongly correlated to (mutually orthogonal) combinations of plant population (linear factor in animal description space)

if you imagine the correlation matrix of the full description of the areas, it has a block structure

A ; B

tB ; C

with A the correlation matrix of the animal description, C the correlation matrix of the plant description and B the cross correlation matrix of the animal-plant descriptors

while PCA uses all this correlation matrix, canonical analysis concentrates only on the B submatrix (indeed, canonical analysis technically boils down to a svd of B)

all this handwaving introduction to indicate that there are two kinds of correlations when you have two description spaces (which is the case in multi-label problems : you have a feature space and a label space) : correlations within each description space (which are not the objective of the analysis : you are not interested in correlations between labels stricly speaking if such correlations have nothing to do with your feature space) and correlations across description spaces (which are the objective of your analysis)

for instance, some labels might well be just correlated noise from the perspective of your description space : PCA would take such correlation into account but without any gain from the classification perspective ; canonical analysis will ignore such correlations and concentrate on cross-correlations between your feature space and your label space

What is the best method for removing paraffin from plant samples prepared for microtome?

Can I separate xylene from paraffin after deparaffinization of microtome samples and reuse the xylene?

Can I introduce an international scientific conference on ResearchGate and recommend researchers to participate in it?

How many days can I store plant samples in paraffin?

What is the best part of a leaf to study the anatomy of monocot and dicot plants?

Can I connect my ResearchGate account to LinkedIn?

Which chemical reagent is suitable (and simple) for staining plant cell nuclei (in leaf and stem samples prepared with a microtome)?

What type of paraffin is recommended for preparing samples for cutting with a microtome?

What concentration is suitable for foliar application of nitrogen fertilizers?

Which protocol is better for measuring chlorophyll content with ethanol?

How to learn more about SPSS and its Application?

Handling Missing Data and Building a Predictive Model with Incomplete Information ?

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

Do you know best mines of western part of Afghanistan?

How to normalize and take the significance of the MTT OD values with 3 replicates for the same cell-line?

Is Galaxy.org good to use for research for analyzing data and for publication?

Do experts have journals in the field of artificial intelligence and big data that are not indexed by SCI or EI?

If we are using snowball sampling technique, how do we justify the true representativeness of the sample statistically? is there any statistical test?

How to report results of Generalised Linear Mixed Models in a journal article?

What are possible strategies can be used to analyze data under sequential explanatory mixed method approach?