What are the NLP approaches for languages with rich morphology?

That depends a lot on what you want to accomplish and how much data you have. You may take a look at my UralicNLP library (https://github.com/mikahama/uralicNLP) that does lemmatization, morphological analysis, inflection and disambiguation for many morphologically rich languages using rule-based FSTs and neural models for some of the supported languages.

Stemming is probably the least useful NLP method for morphologically rich languages especially if there are a lot of changes in the stem. For many NLP applications, lemmatization is more useful because it will ensure that all different inflectional forms of a word will be mapped to the same lemma. For example, the Finnish word käsi is käden in genitive and käteni with the 1st person singular possessive suffix, the lemma for both forms is käsi whereas the stems would be käde and käte.

If you have a lot of data in your hands, just training any of the state-of-the-art neural models like BERT will probably give you good enough results and you do not need to worry too much about the morphological complexity of the language, although lemmatization and splitting compounds might still help. This depends on how complex the morphology of the language is and how much data you have in your hands. Quite frequently all sorts of subword units end up being completely messed up and they don't represent the language in question if there is not enough data for the model to learn the splits correctly, or they might by design not work for the language in question.

So if you do not have a lot of data, you can write some rules and generate data (see Conference Paper Neural Morphology Dataset and Models for Multiple Languages,...

). If you have data, you can train any modern deep learning model. But of course understanding how the model works and how the language behaves are the key here.

Tanya Nazaretsky

hi Thamalu Maliththa Piyadigama ,

We have recently published an article presenting the NLP pipeline for Hebrew.

Article Machine Learning and Hebrew NLP for Automated Assessment of ...

In addition, you can find there some interesting papers that we cite.

Thamalu Maliththa Piyadigama

Anitha S. Pillai Tanya Nazaretsky Mika Hämäläinen Thank you for your answers.

I'll look into your contributions to Hebrew and Uralic languages. I'm working on similar problems. These will be useful.

Kimmo Kettunen

Hi,

here is a link to an old paper of mine. It discusses pros and cons of different approaches up to 2010 or so.

https://ebooks.iospress.nl/volumearticle/7492

Br, Kimmo

Can statistical interpretation of QM explain double slit experiment?

Should we discourage the term electromagnetic waves?

Is Neural Networks being Black box a problem?

Textual resources for Buddhist tradition of astrology

Problem of online publishing in PDF format (Asian Languges)

Animal Farm in General Relativity?

The Bigger You Are, the Harder You Fall (some lessons from Dinosaurs)?

Are air moisture harvesting technologies effective in combating desertification?

State of art in natural disasters?

Broca’s area must be intact for the learning of new movement sequences?

How can I get my Granzyme B flow cytometry stain to be consistent?

The Origin of Human Language?

Posthoc test lettering in JAMOVI?

Creating an Automaton/Using Language as the Model?

What are the roles of innovation in achieving the Sustainable Development Goals (SDG)?

How to do Mann-Whitney U test with Bonferroni corrected p-values?