Should we remove outlier from electrophysiology data set?

Behrouz Ahmadi-Nedushan Popular answer

I agree with Dr. Refik Kanjhan that outliers must be treated with extreme care. If it can be determined that an outlying point is in fact erroneous, then the outlying value should be deleted from the analysis (or corrected if possible).

In some cases, it may not be possible to determine if an outlying point is bad data. Outliers may be due to random variation or may indicate something scientifically interesting. In any event, we typically do not want to simply delete the outlying observation. However, if the data contains significant outliers, we may need to consider the use of robust statistical techniques.

Jochen Wilhelm

It first depends on what makes a value an "outlier":

if you know that the observation is faulty, the value must be discarded (even it it looks ok).
if the value is logically/physically/biologically/physiologically very implausible or even impossible, the value must be discarded.
if the value a bit off from the place of other values, but none of the former options is given, it will further depend on the impact of the value on the analysis, and this in turn depends largely on the sample size. If the imact is large it might be adviseable to report and compare the analyses with and without this suspicious value.

After all, "outliers" (values with no hint of a faulty observations, no hint of implausibility, but yet far away from the rest) should be rare. If they are not rare then there might be a fundamental problem with the experiment, the data handling, or the interpretation (e.g. the distribution model of the variables is not as expected; log-normal and Cauchy-distributed variables "seem" to have many outliers when you expect something like a normal distribution). So in this case you should not think about what to do with the "outliers" - you should rather think about the experiment, the data handling, and the interpretation of the variables.

Rare outliers are not a problem statistically. It happens that one gets "outliers", this is almost ineviteable if one only looks at enough data (makes many experiments, many measurements...). Given the distributional model of the variable is correct, the appropriate statistical methods do consider the existense of such outliers. So there is no need to remove them. It is actually wrong to exlude them, because - statistically- this would lead to an underestimation of the variance, standard errors, confidence intervals, and p-values. This may only be recognizeable (and irritating) in small data sets. In large data sets, however, rare outliers won't have any considerable impact anyway.

The irritating fact in small data sets is that often the existence of "outliers" seem to "distrurb" the message. People are inclined to remove them because the standard errors are then smaller, everything lokks nicer, the effects are clearer. However, given the correctness of the assumed distributional model, having large standard errors in this experiment is the price one has to pay to not under-estimatethe standard errors generally. If one would exclude such values, this particular experiment would look better, but the procedure ("let's have a look and throw out values that look like outliers") will bring about standard errors that are too small on average.

Refik Kanjhan

Dear Chandra,

The outliers must be treated with extreme care, as they can be extreme from being very important to being an artefact. First of one has to be sure that the outlier(s) are not an artefact for whatever reason. If the outlier is not an artefact, then the sampling or n=? becomes very important. For example by increasing data points you may discover a new population or subpopulation. I believe outliers should not be included in the same population if they are changing the values significantly, but rather mentioned seperately in the manuscript, and explained why that data was not included. History is full of examples when one man's artefact has become another scientist fame.

Best wishes, Refik

Behrouz Ahmadi-Nedushan

Chandra Prajapati

Thank you all for your help. But lets see with an example.

These values are just assumption. I have sodium currents in pA/pF from n=10,

56 85 75 72 46 78 45 55 40 5

In this example, should we remove 5 from the data set? Or we should use this ? And recording from that cells good.

What should be done in this case?

Jochen Wilhelm

The cells are good, a current of 5pA/pF is physically possible. Why should you want to remove this value?

Removing this value would mean that you are willing to throw away 10% of your data. If you believe that 1 out of 10 values is really "bad" (not just accidentally far away from the clumping rest), then I would doubt that your experiment is reliable.

It does NOT look far away from the others when the squared values are used, as you can see from the attached normal-quantile-plots of your data.

So the interesting question is: may it make sense that the errors in your observations are related to the squared current - that requires some more understanding of the subject matters. It may really provide a new view or new insights. If this turns out to be interesting, then this poor little value you considerd to kick out provided the only valueable information about all that!

Statistics provide tools to make sense of data - and that requires thinking. Statistics is a waste if used to justify removing outliers to perform "standard procedures".

Refik Kanjhan

Dear Chandra,

Are there any other hints? Is it due to size of the cell or just less Na current for the same size (pF) cell? Most likely the resting membrane potential of this cell (neuron?) is very depolarized close to the level of inactivation of voltage-gated Na-channels. One possibility is that this cell/neuron may be a young developing/under developed/immature/newly differentiated neuron/cell. Another possibility is that it is not a Na-current but rather a Ca-current, do you have pharmacological block? 10-fold reduction in Na current for a healthy normal/developed neuron is alot, very unusual, and the amplitude of action potential will be probably less than 10 mV and that will unlikely succeed to pass any info to the postsynaptic cell. If that is the case then I would not include this value in the analysis, but I would mention/explain it in text (if it is not due to depolarized resting membrane potential).

best wishes, Refik

Chandra Prajapati

All your ideas helps me a lot. Thanks you a lot.

Actually I work on cardiac cells. cells with low value looks fine and recording too. I was just wondering that if there is one or two very small or large value compare to rest of data, should we include it or remove it ?

Certainly if we remove or include those very small or very large, it affects on mean and statistic significance. so I am confuse about that data.

What is the best statistic methods for percentage value of more than 2 dataset?

How to learn more about SPSS and its Application?

Can I base on reverse DNA sequences to perform alignment, convert to amino acids and GenBank submission?

Baseline drift in HPLC? What causes this?

Text-Communication from the M1 Hand Area using BCI—and then there is Elon Musk?

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

How can I use the cif data obtained from rietveld refinement extracted via gsas2, for microstructural analysis using ETEX software?

How are iso-frequency contours plotted?

How to prepare the nanoparticle treated fungal sample for Environmental SEM analysis?

What precautions should be taken while handling S. aureus enterotoxin Type B in the lab?

How to normalize and take the significance of the MTT OD values with 3 replicates for the same cell-line?