Is there a specific way of removing outliers from a data set that has a non-normal distribution?

More Yogarabindranath Swarna Nantha's questions See All

How to calculate sensitivity, LoD and LoQ from a DPV plot of peak current against logarithm of concentration?

In some electrochemistry research papers, the result obtained from a differential pulse voltammetry (DPV) technique is presented in peak current vs. log(concentration) plot. in such cases, how to...

04 May 2024 9,605 2 View

For MS media preparation. I prepared 100mg/mL conc of hormones. For 1mg/L, we will be adding 1mL to the media,for 500ml how much ml should get added?

None

16 October 2023 5,932 0 View

We are measuring a level of Cr that is 14% lower than our customer believes the blind sample has in it?

We are measuring a level of Cr that is 14% lower than our customer believes the blind sample has in it. What can be a problem like standard curves are linear, Blank count is low, and RSD is below...

21 February 2023 240 0 View

ICPMS measuring a level of Cr that is 14% lower than our customer believes ?

We are measuring a level of Cr that is 14% lower than our customer believes the blind sample has in it. What can be a problem? Standard curves are linear, Blank count is low, and RSD is below 3%...

21 February 2023 6,947 2 View

Multiple ensembl gene ID for the same gene name (Symbol), how to deal with this while differential analysis?

Dear all, I am processing my RNA-seq data, during processing I observed multiple ensembl gene IDs for the same gene name (symbol). I went into the details and found these are from alternate loci...

03 February 2021 3,805 0 View

How to measure output shaft torque in Adams view? #adams?

Created a single stage speed reducer in Adams view. Applied a constant torque on input gear. After simulation, plotted the torque vs time for output shaft. But torque curve is fluctuating not a...

05 September 2020 5,816 2 View

Can anyone explain how I might be able to assess learned helplessness in patients?

I intend to examine the possibility of learned helplessness in patients. I would like to know the process of evaluating this behaviour in patients and what scales I should use?

01 February 2017 8,125 4 View

If your study is an non matched case control design, do you need to match age, sex, gender etc. for prevalence rates calculation? If yes, why?

Greetings all. If you have two groups (one exposed and not exposed) using an unmatched case-control study design, do you need to match all other variables? Could this be avoided by just performing...

19 May 2016 8,640 6 View

Can I perform a validation of scale into a different language but avoid conducting construct validity temporarily?

I conducted the validation of a scale into another language and it seems to mirror the findings of the original and other international results. However, I was not able to conduct a construct...

21 January 2016 5,736 1 View

When performing factor analysis on dichotomous data, is it sufficient to use SPSS alone?

While attempting to validate a questionnaire, I have noticed that many authors (from different countries) use different softwares such as R Statistics or Mplus Statistical Package to perform...

21 June 2015 5,121 11 View

How to learn more about SPSS and its Application?

I would like to learn more about SPSS and Its application especially in regards to data analysis. Please suggest me how I can learn more about it. Thank you so much.

11 August 2024 9,101 4 View

Can I base on reverse DNA sequences to perform alignment, convert to amino acids and GenBank submission?

I have reverse sequences (AB1 format), can I base on reverse DNA sequences to perform nucleotide alignment, convert nucleotides to amino acids and deposit the sequence in GenBank database?

11 August 2024 5,138 1 View

Baseline drift in HPLC? What causes this?

Hello, Why do i see this baseline drift when i compare my blank (black) to the sample (blue)? Any suggestions as to why this happened? Thank you!

11 August 2024 3,770 4 View

Text-Communication from the M1 Hand Area using BCI—and then there is Elon Musk?

Willett, Shenoy et al. (2021) have developed a brain computer interface (BCI) that used neural signal collected from the hand area of the motor cortex (area M1) of a paralyzed patient. The...

10 August 2024 7,180 0 View

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

I'm currently exploring the application of Python in textile engineering, specifically in areas like data analysis, process automation, and the development of smart textiles. I'm interested in...

10 August 2024 7,429 2 View

How can I use the cif data obtained from rietveld refinement extracted via gsas2, for microstructural analysis using ETEX software?

09 August 2024 7,718 0 View

How are iso-frequency contours plotted?

Let's say we have a standard, regular hexagonal honeycomb with a 3-arm primitive unit cell (something like the figure attached; the figure is only representative and not drawn to scale). The...

07 August 2024 1,937 1 View

How to prepare the nanoparticle treated fungal sample for Environmental SEM analysis?

A fungal strain was treated with nanoparticles. We want to do an environmental SEM analysis. So could anyone share your views on preparing the sample? Thank you.

07 August 2024 5,307 1 View

A paper on a fossil lycopod?

Dear all, Does anybody have a pdf of this paper? Wood, G.R. and Beeston, J.W., 1986. A late Permian lycopod cone Skilliostrobus sp. cf. S. australis Ash 1979 from Queensland. Geological Survey of...

07 August 2024 4,857 2 View

How to normalize and take the significance of the MTT OD values with 3 replicates for the same cell-line?

Hi, I have a question about normalizing the MTT OD values for doing the statistical analysis. So, if we have 3 different plates and we call them 3 different replicates, so, first we would...

07 August 2024 8,106 4 View

Muayyad Ahmad

outliers can be identified through many techniques:

- you can run frequency for each variable and examine by eye the normal range

- you can do box plot and exclude the data beyound 3 SD

- you can run skewness to maintain the value within acceptable level {textbooks vary in the upper range, some recommend skewness not to exceed .2, other go up to 1.0}

Good luck

David L Morgan

You might look into the Mahalanobis distance approach to finding multivariate outliers. I'm not sure how it handles departures from normality.

Yogarabindranath Swarna Nantha

Thanks David, will look into that shortly.

Frederick Dayour

I think one way is to run frequencies and examine data with the eye. You can then go back to the data to delete outliers.

Peter Samuels

I would advise against removing outliers in this way unless you have reason to believe that they are invalid. Perhaps there is a robust version of the questionnaire validation technique you are using which will handle them within the data.

Thanks Peter. I have decided not to remove much of the outliers. Going to get more samples to improve the generalizability.

Bess A Rose

It sounds like you've come up with a good solution. If you're still interested in the question of detecting outliers with a non-normal distribution, I found this article helpful: "Detecting outliers: Do not use standard deviation around the mean, use absolute deviation around the median," by Leys et al. (2013) in the Journal of Experimental Social Psychology, vol 49. Using 3 SD around the mean is technically inappropriate if the distribution is non-normal. I would be interested if anyone has used the absolute deviation around the median and found it helpful.

I understand your position about keeping the outliers to maintain generalizability, but I'm not sure that I agree with it. In particular, is your original theory designed to be generalizable to the entire population, or was it stated in ways that would apply to nearly everyone nearly all of the time?

In other words, most theories are generated, explicitly or implicitly without any thought whatsoever to outliers. That means that when we start to observe samples that represent entire populations, we encounter cases that fall outside the boundaries of our theory.

So, for parameters such as estimating a mean, it is probably more important to maintain outliers, than it is when your goal is theory testing.

Niklas Hansen

Although Mahalanobis distance is originally thought to be used for multivariate normal distributions, there has been statistical-deductive efforts made to clarify its application within a non-normal context

See Ekström's (2011) article "Mahalanobis' Distance Beyond Normal Distributions":

link: https://escholarship.org/uc/item/24w7k7m1

It has been awhile since I visited this thread of discussion. I am grateful for all the answers given. I managed to get a few papers published after taking into consideration all the comments here. My infinite thanks to all.

Aiyush Bansal

You can just use upper and lower quantiles. We use nonparametric statistical methods to analyze data that's not normally distributed. In the same way, instead of using standard deviation, you would use quantiles. That is, you can say to assign NaN to values greater than 95% and less than 5% of the data set.