Is there any influence of the mismatch between the language used for training the system hyperparameters (TVS, LDA, and PLDA hyperparameters ) and the system users' language on the performance of the speaker recognition system ??
Language mismatch between train and test data can surely affect the performance of the SR system (the error rate can be doubled or even worse if there is a mismatch between languages used to train the GMM, TV matrix and the PLDA model), I think that this paper can be a good starting point [1], it also proposes a phoneme histogram normalization technique to match the phonetic spaces of train and test languages. A possible solution is to use many languages to train your system, i.e. GMM, TV matrix and PLDA (using NIST SRE data) or other databases. Some systems [2] use up to 11 languages.
---
References :
[1] Abhinav Misra, John H. L. Hansen , Spoken language mismatch in speaker verification: An investigation with NIST-SRE and CRSS Bi-Ling corpora
[2] Pavel Matejka et al, Analysis of DNN approaches to speaker identification.