I'm working on a speaker recognition challenge.

I have already trained my model on the voxceleb2 dataset in triplet setup. Now, for the challenge, I have two sets.

enrollment (1 audio/subject) [IDs given]

test (random number of audios without IDs)

I need to report EER on the test.

Is it okay if I train my model on enrollment data too or it will be considered leakage/cheating while reporting EER?

Let me elaborate, In speaker verification/recognition, we have enrollment and trials data. In the experiment, let's say, I have 5 known subjects/speakers. I first record their speech and label them. This is my enrollment data.

Enrollment:

speaker 1 -> audio_1

speaker 2 -> audio_2

speaker 3 -> audio_3

speaker 4 -> audio_4

speaker 5 -> audio_5

Now, I take some random speech data from other sources + more audio data from the 5 speakers, this is my test data.

Test:

speaker 1 -> audio_11

speaker 2 -> audio_21

speaker 3 -> audio_31

speaker 4 -> audio_41

speaker 5 -> audio_51

random -> random_1

random -> random_2

Now, I will generate the trials from the test.

speaker 1, audio_11

speaker 3, audio_11

speaker 2, audio_21

speaker 4, random_1

speaker 1, random_2

I need to predict from the trials if audio_11 belongs to speaker 1 or not, audio_11 belongs to speaker 3 or not, audio_21 belongs to speaker 2 or not, etc. (based on audio similarity).

In my case, I'm segmenting the enrollment audio and training my model on them before making the predictions on the trial/test data.

More Zabir Al Nazi's questions See All
Similar questions and discussions