I am going to build a system which will produce non-verbal behavior in a data-driven way, based on the speech signal and its transcription.

I need to decide which speech feature to use, so that I can train it on the human recordings and then use it on the humanoid robot NAO.

Main problem is that the robot's speech will not have the variability of the natural speech,as it is produced by the text-to-speech system. So I need to be carefull in not learning smth, that can work only for humans and will not work on my robot.

Similar questions and discussions