Hello everyone! I am researching the applications of Vision-Language Models (VLM) in incremental learning for segmentation.
While I've found many papers on unsupervised domain adaptation and image classification using VLM, there seems to be a lack of literature specifically on incremental learning segmentation. Could anyone recommend relevant papers or studies? Thank you very much!
You're absolutely right—incremental segmentation with VLMs is an emerging area with limited but growing literature. I’d recommend looking into:
Yu et al., CVPR 2023 – "Foundation Model Drives Weakly Incremental Learning for Semantic Segmentation"
They leverage CLIP for class-incremental segmentation under weak supervision.
"Learning from the Web: Language-Driven Weakly-Supervised Incremental Learning" (ECCV 2024)
This uses vision-language data (like image captions) to add new segmentation classes without full supervision.
Both address catastrophic forgetting while incrementally expanding class coverage using VLMs.
On a related note, in my own recent work titled "Advanced Crop Recommendation System: Leveraging Deep Learning and Fuzzy Logic for Precision Farming", we explore domain-adaptive image segmentation under evolving agricultural categories, where incremental learning for multi-class expansion is essential. While not vision-language driven, the framework bridges segmentation with evolving class labels—an adjacent challenge to your focus.
Please read more on my work titled "Advanced Crop Recommendation System: Leveraging Deep Learning and Fuzzy Logic for Precision Farming" :-
Article ADVANCED CROP RECOMMENDATION SYSTEM: LEVERAGING DEEP LEARNIN...
As is often the case on the web, search results are largely dependent on the language vocabulary (keywords) used...
In artificial intelligence, continuous (or online) learning almost exclusively implements the mathematical concepts of recurrence (recurrent) and recursion (recursivity). Also, a new token has recently been introduced into the machine learning literature: “continual learning”.
By combining these terms with VLM, vision, language and segmentation, you will access a whole range of publications, including (but not limited to):
Yu et al., "Boosting Continual Learning of Vision-Language Models via Mixture-of-Experts Adapters", 2024 - https://arxiv.org/pdf/2403.11549
Hou et al., "VisualRWKV: Exploring Recurrent Neural Networks for Visual Language Models", 2025 - https://aclanthology.org/2025.coling-main.694.pdf
L. Pellegrini, "Continual Learning for Computer Vision Applications", 2022 - https://amsdottorato.unibo.it/id/eprint/10401/1/Lorenzo%20Pellegrini%20-%20PhD%20Thesis.pdf
Tang et al., "Mind the Interference: Retaining Pre-trained Knowledge in Parameter Efficient Continual Learning of Vision-Language Models", 2024 -
Preprint Mind the Interference: Retaining Pre-trained Knowledge in Pa...
Sun et al., "CLIP as RNN:Segment Countless Visual Concepts without Training Endeavor", 2024 - https://openaccess.thecvf.com/content/CVPR2024/papers/Sun_CLIP_as_RNN_Segment_Countless_Visual_Concepts_without_Training_Endeavor_CVPR_2024_paper.pdf
Sokar et al., "Continual Learning in Vision-Language Models via Aligned Model Merging", 2025 - https://arxiv.org/pdf/2506.03189
Zang et al., "Continual Learning of Image Classes with Language Guidance from a Vision-Language Model", 2024 - https://openreview.net/pdf?id=Z4OpKd7wOD
Hannan et al., "ReVisionLLM: Recursive Vision-Language Model for Temporal Grounding in Hour-Long Videos", 2024 -
Preprint ReVisionLLM: Recursive Vision-Language Model for Temporal Gr...
Several papers recommend exploring incremental learning for segmentation based on Vision-Language Models (VLMs). These papers often address the challenges of catastrophic forgetting and the need for efficient learning of new classes with limited data.
Incremental learning addresses this challenge by enabling models to adapt continually to new and nonoverlapping tasks, while ensuring the maximum retention of knowledge from previous tasks to facilitate real-time inference.