I recently started to work on Speaker/Language recogntion using i-vector, and after consluting with researcher on researchgate, I came to the following steps:

1) Database

i) Developement dataset (UBM, T training), if labeled (LDA and PLDA also)

ii) Training dataset(For speaker/Language Enrollment, modeled speakers), if the Developement dataset is not labeled, I trained LDA and PLDA on training dataset(needs comments on this) 

iii) Testing dataset (for testing the modeled speakers/language)

About Language Detection:

If I have lot of speech samples, but no labled for that speech utterance, how can I train LDA/PLDA for languages? or can I trained these on training languages data? 

What about the Gender? how much the results will be effected if we have different/same UBM, T? Is it ok to have single UBM, T for both genders?

Is there any way to apply the i-vector detection without applying LDA and PLDA such as SVM on i-vectors without i-vector reduction??

More Rizwan Ishaq's questions See All
Similar questions and discussions