I have a data set of 400 data points (drawn from 24 subsets) from an engineering study and have developed an expression that describes them. On a plot of predicted versus measured values, 22 points lie well away from the line of equality, 18 of which come from just three subsets. I have calculated z values for them using the expression,

z = (outlier value - predicted value) / standard error of estimate.

The z values calculated thus, range from 17 to -3.3, 16 being positive and six negative. I have adapted the equation for z based on the discussion "What is the acceptable number or outliers in a research?”. Is this faulty reasoning? Is there a better way of justifying (by quantification) omitting these 22 data points?

Similar questions and discussions