If you build your own corpus to address specific research questions, which method to you use to make sure It is saturated? I'm interested in methods as I work on digital data and I wonder which method is more efficient and less time-consuming.
In Corpus design, the "saturation corpus" is associated with the concept of "representativeness", developed by Douglas Biber: .
Here are some other sources from the University of Lancaster that might interest you:
1. Representativeness, balance and sampling:.
2. Corpus representativeness and balance:.
... Methods: e.g. a short paper from the University of Birmingham :
3. Exploring Methods for Evaluating Corpus Representativeness : https://www.birmingham.ac.uk/Documents/college-artslaw/corpus/conference-archives/2017/general/paper277.pdf.
4. A quantitative approach to corpus representativeness: