I am studying animal cognition and I have data about animals' latency to approach a cue and animals' distance from the cue in a test. Data are not normally distributed and when I do scatterplots having latencies in one axis and distances in the other, there are extreme values in both variables or in just one variable, depending on which sample I select. I decided to look for statistical techniques to detect these outliers in a proper and non subjective way. For the moment I found:

- MAD (absolute deviation around the mean, with conservative approach), that not only removes the outliers that I can spot from the graph, but also other values that are more clustered with the rest. Since it seems very aggressive even with conservative threshold, I decided to apply it only when I can actually see that there are gaps in the scatterplot, otherwise I would have just a few datapoints left.

- boxplot, which is just a way to see how my data are distributed, but I found a page that says that values outside whiskers are outliers https://medium.com/@agarwal.vishal819/outlier-detection-with-boxplots-1b6757fafa21. This method seems to detect datapoints that are really far from the rest of datapoints more often than the MAD method.

The first method is documented in a paper "Detecting outliers: Do not use standard deviation around the mean, use absolute deviation around the median" and has been used in some research areas, such as psychology. The second method is not documented, but it seems more straightforward and it seems to work better. Are you aware of studies in the field of animal cognition (with measures of latencies and distances) that needed to remove outliers and what they did?

More Alessandra Munari's questions See All
Similar questions and discussions