I know some basic approaches that can be used on languages with rich morphology.

1. Stemming

2. Lemmatizing

3. Character n-grams

4. FastText embeddings

5. Sentencepiece

I would like to know if there any more recent development and what the researchers feel about the robustness of each method in specific domains (Indic languages etc.)

Similar questions and discussions