I'm working on a model for publishing in a journal. I'm removing outliers from results of my models. Suppose that I have 100 trained neural network, inserting out-of-sample data to these models and obtaining results. In this step I'm removing outliers (based on abs(x-mean(x))>= 2*s.d and use average of remaining results. How can I prove we need removing outliers in my paper? What statistically procedure or graphical presentation we need?
Update. X-axis is every out-of-sample and Y-axis is outputs for every sample. Output range is 0-1 (50 samples from 8000 samples presented in figure). In this figure green filled circles are geomean, red filled circles are averaging after removing outliers using above formula and blue filled circles are arithmetic mean. We have three outputs here. Which averaging method is suitable in this case? How can I prove that? I think we based on below figures we have appropriate mean value when removing above 2 s.d outliers. What do you think?