I have a corpus of documents in English language. Each document is labelled sentence-wise with labels associated with the domain

I have another corpus with the same documents in another language.

I want to label the non-English corpus in an unsupervised fashion according to the labels of the English corpus.

There is the possibility that one sentence in English may correspond to multiple sentences in another language or vice-versa.

In this case, all the sentences that are the translation of a single original sentence will have the same label of the original sentence.

What would it be the best approach? Which are relevant work with similar setting that I should study?

Similar questions and discussions