Greetings, Everyone!
I continue to be very interested in research of Indigenous language recovery initiatives in Latin America. In my recent article, Bernard and Benn (2025) conduct a qualitative historical analysis of both endangered and extinct Indigenous languages, mapping traditional oral-transmission methods alongside AI tools (OCR, TTS, text-to-video) for community-led recovery initiatives.
Today, I came across Anderson et al. (2025), who introduce Morphemo, a semi-supervised n-gram segmentation algorithm that outperforms LLM prompting on multimorphemic splits in Bribri. I’m interested in studies that similarly combine quantitative morphological segmentation with holistic recovery frameworks, especially for other low-resource Indigenous languages of Latin America.
How have researchers integrated algorithmic segmentation tools into broader recovery (revitalization or reclamation) pipelines, and what community-driven validation strategies have been employed? Any pointers to published work or ongoing projects would be greatly appreciated! (Anderson et al., 2025; Bernard & Benn, 2025)
All the best,
Dianala
References Anderson, C., Nguyen, M., & Coto-Solano, R. (2025). Unsupervised, semi-supervised and LLM-based morphological segmentation for Bribri. In Proceedings of the Fifth Workshop on NLP for Indigenous Languages of the Americas (AmericasNLP) (pp. 63–76). Association for Computational Linguistics. https://doi.org/10.18653/v1/2025.americasnlp-1.7
Bernard, D. M., & Benn, M. (2025). Revitalization or reclamation? Reframing the recovery of Indigenous languages in Latin America: A historical and AI-driven approach. International Journal of Language, Literature and Learning Communities, 4(1), 104-131. https://doi.org/10.59009/ijlllc.2025.0103