Hello there, I would like to ask some questions in terms of my research "The Effectiveness of Fishbowl Technique Towards Students Speaking Ability." I've been looking for a suitable reliability for my instrument that i am making recently. I use oral test of Interview to obtain data to measure students' speaking skill that uses likert scale assessment (1-5) on 5 components of speaking. The lecturer advised me to bring three raters (including me) to avoid the biases from subjectivity, and make time efficient since educational institution is my target population (11th grade students). When I planned to conduct the tryout for 35 students in class. I am still confused on which reliability tests are usable for my case. I planned to divide students into three rater groups as lecturer suggests so, which rater 1 for students number 1-12, rater 2 for 13-24, and so on. I tried to use ICC or pearson correlation (through test-retest) but it needs overlapping data which raters should assess all same students but it will took longer and it may disturb the learning progess on the school. I might be so clueless that I can't solve this myself. Please, any suggestions or feedback regarding my case will be appreciated. Thank you, and pardon me about the long text.