I am trying to crowdsource mean opinion scores (MOS) for a speech synthesis model. As mos is a subjective metric I'm trying my best to avoid subjective bias.
So far my plan is to have a GitHub page containing the samples and ask different users (from social media) to give an opinion (1-5) for the samples through a web interface. From that, I will calculate the mean opinion score after doing statistical post-processing (following this paper https://github.com/Netflix/sureal/blob/master/resource/doc/dcc17v3.pdf).
Recently, many papers mention Amazon Turk for this type of task, but for my task it's an overkill. Do you have any suggestions regarding this?