How can we simulate various spoofing attacks (such as speech synthesis, voice conversion etc.) on speech data for developing a robust Speaker Verification System?
Does there exist any freely available dataset for speaker verification task?
The human ear and brain are very good at speech recognition as are Siri and other computer-based algorithms. To spoof an person's speech, you must first have a good, lengthy sample of the speech and then develop what is essentially a vocal tract model for the human speaker. The model is a transfer function between the vocal cords, air supply, and air flow of the specific human vocal tract and the listener. Of course the model changes if the speaker is sick, has a cold, swollen vocal tract, etc. Once you have a vocal tract model and proper excitation sounds (like vocal cords), you can create speech that spoofs the speaker. This is not an easy task because you must learn a lot about how human speech is created.
Please ask another question if you need clarification.
The voice biometrics includes speaker recognition from their speech and from their humming sounds as well. Speaker recognition can be classified as speaker verification and speaker identification. The goal of Automatic Speaker Verification (ASV) system is to verify the claimed identity of a person from his or her voice with the help of machines. However, with the recent advances in speech technology, the ASV systems are vulnerable to the speech-based spoofing attacks also known as presentation attacks according. The researchers have identified various speech-based spoofing attacks that include speech synthesis (SS), voice conversion (VC), replay, impersonation, and twins.