I have a data set of 50 speakers, with 20 utterances of each speaker. In order to train a GMMUBM system can I use features from the first 10 utterances of each speaker to train the UBM model, and next 5 utterances for adapting user-specific models and features from the next 5 utterances to test the system.?

Similar questions and discussions