Hello everyone, I hope you're doing well.
I recently conducted a test on simulating near-field reflections. Using a measured dataset of OBRIRs from a KEMAR HATS in an anechoic chamber, facing a reflective surface at distances of 0.25m and 0.5m as the hidden reference. I then created a simulated room and generated OBRIRs using AKTools roomsimualtion software, using various HRTFs of near-field (matching the 0.25m and 0.5m) and far-field measurements (an overall 2m measurement).
These were then presented to listeners using headphones and head tracking, convolved with separate male and female voice stimulus that had been modelled to come out of the listeners mouth, and the listener had to imagine that it was. and repeated 3 times for each voice. For each comparison, they were asked to pick which of the 3 options (the measured, near field HRTF, the far field HRTF) they thought was the most real/believable/plausible and then rate it on a scale from 1-6, 1 being not at all, 6 being very plausible. Each comparison, the options were randomised, so that the listener wouldn't get used to picking the same one. This was then repeated 3 times for each voice, then also repeated another 3 times for the other distance. This gave a total of 12 measurements per listener (3 male 0.25, 3 female 0.25, 3 male 0.5, 3 female 0.5).
My Hypothesis was that each of the options would be equally plausible and so there would be an equal selection from the listeners choices overall. So a presumed split of 1/3 between each option. I thought a Chi Square test would be suitable, however this is not true, as the data holds multiple answers from each listener.
I can't seem to find any data analysis methods that work for this setup? I thought about just taking each listeners initial response for the male 0.25, then female 0.25, then male 0.5, then female 0.5 and comparing that...somehow using Chi Square?
I was also intrigued if the distance and the voice had an effect on which option the listeners liked the most?
It does seem like there is a slight difference in which option was preferred. From a total of 22 listeners, the far-field HRTF had a higher frequency of 105, compared to the reference of 71 and the near-field of 88. I'm mostly looking at tests that can say whether this is statistically significant or not, but with a sample size of 22, I doubt I'll be able to make any huge judgements. But some listeners caught onto which they preferred and gave the same option each time. I might need to exclude this, or keep it I'm not sure?
Any advice you can provide is greatly appreciated, any further questions or information you need please let me know!
Thank you!