I recently started to work on Speaker/Language recogntion using i-vector, and after consluting with researcher on researchgate, I came to the following steps:
1) Database
i) Developement dataset (UBM, T training), if labeled (LDA and PLDA also)
ii) Training dataset(For speaker/Language Enrollment, modeled speakers), if the Developement dataset is not labeled, I trained LDA and PLDA on training dataset(needs comments on this)
iii) Testing dataset (for testing the modeled speakers/language)
About Language Detection:
If I have lot of speech samples, but no labled for that speech utterance, how can I train LDA/PLDA for languages? or can I trained these on training languages data?
What about the Gender? how much the results will be effected if we have different/same UBM, T? Is it ok to have single UBM, T for both genders?
Is there any way to apply the i-vector detection without applying LDA and PLDA such as SVM on i-vectors without i-vector reduction??