I trained a 3-layer LSTM network to extract d-vector embedding using keras. I extracted MFCC features from TIMIT dataset as input to the model, and defined a custom loss function (i.e. GE2E loss).
After less than 213444 batches I got zero loss (on train and dev sets), however, when I use the model to predict d-vecotrs (even using input form training set) I keep having nearly the same output (i.e. The cosine similarity between any output vectors is 0.99999xxx).
I double checked the code and the loss function implementation, it seems to be correct.
Any idea what might cause such problem?