A generic transformation to normality?

13 August 2015 26 942 Report

Here is generic method to transform a random set of values to normality:

1. Compute the empirical cumulative distribution function from the observed data (e.g., by using 'ecdf' function in R)

2. Smooth the function using smoothing spline and call in G(x) (e.g., by using the 'smooth.spline' function in R)

3. Now let U=G(X) should approximately be distributed as uniform distribution on [0, 1].

4. Z=Q(U) would then be distributed approximately normal with mean zero and variance 1.

The question is why do we still keep trying various arbitrary transformations (such as log, square-root, Box-Cox etc.)? Can't we simply use the above steps to transform a data (obtained from an arbitrary continuous distribution) to normally distributed?

In fact, here is a set of sample R codes to illustrate the method:

#####Generic Transformation to normality#######################

x=c(rnorm(25,-1.5,0.5),rnorm(75,1,0.5)) #data from mixture of normals

par(mfrow=c(2,2))

hist(x,prob=T) #you will see a bimodal shape

Fn=ecdf(x); fit=smooth.spline(x,Fn(x))

plot(fit,type="l")

u=predict(fit)$y

hist(u,prob=T) #you will see uniform shape

z=qnorm(u)

hist(z,prob=T) #you will see a normal shape

lines(sort(z),dnorm(sort(z)))

ks.test(z,"pnorm") #formally tests if transformed data is normal

#####Copyright (2015): Sujit K. Ghosh###############

A sample output is attached where the data is generated from a mixture of normal distribution.

Badges
Science topic

Similar topics
Mathematics
Statistics

More Sujit K. Ghosh's questions See All

Z-average size from Dynamic Light scattering can it be lower than the original probe size?

Hi everyone, Recently I have been conducting Dynamic Light scattering experiments in a micellar solution at 5 and gel at 37 degrees of Celsius with latex particles of diameter 190-500 nm. While...

01 August 2024 1,168 4 View

GSH estimation assay: What is the right choice of standard?

Hi there, My question is: What standard curves should be used while estimating Tot GSH and GSSG by kinetic method using GR enzyme mediated recyling with DTNB chromophore? Actually I am following...

01 August 2024 8,217 1 View

Intracerebroventricular (ICV) AAV inj.?

I'd like to know if anyone has got experience in intracerebroventricular (ICV) AAV inj. I am looking for a concomitant spread of the AAV including but not limited to the motor cortex and the...

12 July 2024 781 0 View

What is the maximum acceptable degradation of gain in the orthogonal plane relative to the beam scanning plane for beam steering antennas?

In research papers on beam steering antennas, while the radiation pattern for the beam scanning plane (such as the horizontal or vertical plane) is typically presented, details regarding the...

03 July 2024 8,321 0 View

What kind of surface morphology must be there for good photocatalytic activity of thin film?

If I want to increase the photocatalytic activity of a thin film sample what should be it's ideal surface morphology and parameters? All scientific answers related this are highly appreciated.

19 June 2024 6,744 0 View

Why I am getting follwing error when I tried to open .dta file in SPSS25?

>Error # 7202. Command name: GET STATA >Input dictionary read error. >Execution of this command stops. Cross Product Matrices are not supported

19 June 2024 5,423 0 View

Which deposition technique is better (Spin Coating/Dip Coating) for making Ni doped ZnO thin film for better photocatalytic activity?

If I want to improve the photocatalytic activity of Ni doped ZnO thin film then which would be the better deposition process for thin film Spin Coating or Dip Coating?

18 June 2024 7,678 2 View

What are the possible parameters needed to be tuned for better photocatalytic activity of thin film?

If I want to enhance the photocatalytic activity of Ni doped ZnO thin film. For this what are the parameters I need to tune and how? Related this all scientific answers/explanations are highly...

16 June 2024 4,533 1 View

Curing of Novolak resin with hexamine develops yellow color. I believe it is due to azo linkage -C-N=N- C- What is the reaction mechanism?

Also, I will appreciate whole curing mechanism of Novolak with Hexamine

14 June 2024 9,908 0 View

How are cancerous cell lines generated in 1900s sold today have the same genetic profile?

From the definition of cell line, they are indefinitely propagated/subcultured. However, in practice beyond a certain passage number, they are said to loose characteristic gene profile, for e.g....

10 June 2024 6,890 1 View

How to learn more about SPSS and its Application?

I would like to learn more about SPSS and Its application especially in regards to data analysis. Please suggest me how I can learn more about it. Thank you so much.

11 August 2024 9,101 4 View

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

I'm currently exploring the application of Python in textile engineering, specifically in areas like data analysis, process automation, and the development of smart textiles. I'm interested in...

10 August 2024 7,429 2 View

"A Markov-like Model for Patient Progression"?

A Markov-like Model for Patient Progression" Markov Chain Monte Carlo (MCMC) Markov Chain Monte Carlo (MCMC) is a powerful computational technique used to draw samples from a probability...

05 August 2024 10,079 0 View

Could dyes amplify the spectrum of light to a specific wavelength?

I am interested to know the behavior of dyes toward light. Specifically, Blue dyes re-emit the spectrum, especially from the green zone (known as principal in LED lamps, and blue dyes are known...

05 August 2024 3,290 1 View

How to report results of Generalised Linear Mixed Models in a journal article?

Hi everyone, If you have written or come across any papers where Generalised Linear Mixed Models are used to examine intervention (e.g., in mental health) efficacy, could you please share the...

04 August 2024 4,130 4 View

Which test should be used to study association among demographic profile and awarness level?

i have to study the awareness and adoption level of cloud computing in a district of India. i also want to use association among demographic variables like gender, age, education, income etc and...

02 August 2024 2,420 3 View

Should I remove an item from a scale to raise Cronbach's alpha and McDonald's omega or is it better to leave it if they are both over .7 already?

Hello! I have this scale which had 10 items initially. I had to remove items 8 and 10 because they correlated negatively with the scale, and then I removed item 9 because Cronbach's alpha and...

01 August 2024 4,606 7 View

Why 3 replicates for most biological assays? Is it enough to examine the data fits normal distribution?

Just bounced on me. Before statistically analysing significant difference, shouldn't we see if data fits normal distribution first? Is 3 replicates enough to testify the hypothesis of normal...

31 July 2024 8,141 13 View

Normality assumption for linear regression is The assumption of normality is whether for residual errors or predictor variavble?

When we conduct linear regression, there are several assumptions. The assumption of normality is whether the residual errors are normally distributed, not whether a predictor is normal?

31 July 2024 6,164 3 View

Posthoc test lettering in JAMOVI?

Does anyone know of a module for the JAMOVI software that is capable of generating mean separations using the classic letters based on post hoc results (e.g., Tukey test)? If, as I believe, such...

31 July 2024 3,333 4 View