I am currently developing a team-level assessment of Psychological Safety (PS) that will be used by companies in an applied setting.
Specifically, the assessment will be a standard self-report Likert scale, with ~80 items spread across ~10 subscales. Teams will use this assessment to determine their PS levels and identify which intervention actions to take. There is also indications that our clients intend to use this assessment in an evaluative manner to judge managerial performance, and make team assignment decisions. This assessment will be administered, scored, and results reported via software, with no direct contact with me or other individuals with expertise in assessment administration or psychometrics.
I hope to establish between-industry norms for the assessment. The main push for these norms is forthcoming government regulations that will require companies to report assessment results regarding their performance relative to other companies in their industry.
Locating research on norms for team-level assessments is what has been surprisingly difficult.
I have spent days fruitlessly doing a literature search to locate specifics regarding sample size for establishing these norms. It seems the field is self-aware that most guidelines for norms are so vague they border on useless. However, I have been able to find some specific suggestions - that a minimum of N = 100 - 150 is required (e.g., Tett, Pieper, Wadlington, Davies, & Anderson, 2009; Gaddis, Foster, & Lemming, 2015).
However, all this research has been aimed at individual-level assessments. So it is unclear how this translate to team-level context. Do I need N = 150 teams within each industry? Or N = 150 individuals across a diversity of teams from a variety of companies within each industry? Is there a minimum number of teams and companies I should shoot for?
I understand that these things are more complex than a simple number (e.g., diversity and representativeness of sample is more important than the N, the importance of using of randomization and stratification in our sampling approach, etc.). But I am still hoping for a number or range of N that will at least give me a basic framework I can use to interpret and guide our sampling approach.
If someone could provide recommendations regarding a team N, that would be wonderful. Equally (if not more) appreciated would be citations that discuss assessment norming for team-level assessments.
Thank you for anything you can offer!