I'm doing research on clustering speech utterances based on language. It seems to me that the only article dealing with such problem is:
Reynolds, Douglas A., et al. "Blind clustering of speech utterances based on speaker and language characteristics." ICSLP. 1998.
Maybe someone here is familiar with more recent work or is also working on that problem?
http://www.mirlab.org/conference_papers/International_Conference/ICSLP%201998/PDF/AUTHOR/SL980610.PDF