Hi, everybody.
I collected a list of 700 example sentences from domain specialists. And used this list as a basis for generating new 9 k sentences using a generative language model. Now, I am looking for methods for evaluating the quality of my generated corpus.
I have trained an n-gram language model using the generated corpus and measured the model perplexity in the specialists' sentences. I have good results on it, but I think I can evaluate it using other methods.
If you have any related research, please let me know.
Thank you in advance.