I recently, with co-authors David Bowes and Tracy Hall, conducted a meta-analysis of 600 results from a number of software defect prediction studies:

https://www.researchgate.net/publication/262003721_Researcher_Bias_The_Use_of_Machine_Learning_in_Software_Defect_Prediction

Which is published in IEEE TSE 40(6) pp603-616.

We tried to understand what factors impact prediction performance. Worryingly the dominant factor is neither the choice of algorithm (eg random forest, etc) nor the data set but actually the research group that conducts the study.

I do not believe that researchers are intentionally distorting their results but it's well known that scientists have a preference for some results over others [1-3].

One remedy we suggest is blind analysis. This entails masking the treatments from the analyst, usually by re-labelling to something uniformative.

So my question is:

Should blind analysis be a norm in empirical software engineering?

Thanks for your ideas and thoughts

Martin Shepperd

References:

[1] R. Rosenthal, "On the social psychology of the psychological experiment: the experimenter's hypothesis as unintended determinant of experimental results," American Scientist, vol. 51, pp268-283, 1963.

[2] J. Sherwood and M. Nataupsky, " Predicting the Conclusions of Negro-White Intelligence from Biographical Characteristics of the Investigator," J. of Personality & Social Psychology, vol. 8, pp53-58, 1968.

[3] K. Dickersin, "The existence of publication bias and risk factors for its occurrence," J. Am. Med. Assoc., vol. 263, pp1385-1389, 1990.

Article Researcher Bias: The Use of Machine Learning in Software Def...

Similar questions and discussions