Here is generic method to transform a random set of values to normality:
1. Compute the empirical cumulative distribution function from the observed data (e.g., by using 'ecdf' function in R)
2. Smooth the function using smoothing spline and call in G(x) (e.g., by using the 'smooth.spline' function in R)
3. Now let U=G(X) should approximately be distributed as uniform distribution on [0, 1].
4. Z=Q(U) would then be distributed approximately normal with mean zero and variance 1.
The question is why do we still keep trying various arbitrary transformations (such as log, square-root, Box-Cox etc.)? Can't we simply use the above steps to transform a data (obtained from an arbitrary continuous distribution) to normally distributed?
In fact, here is a set of sample R codes to illustrate the method:
#####Generic Transformation to normality#######################
x=c(rnorm(25,-1.5,0.5),rnorm(75,1,0.5)) #data from mixture of normals
par(mfrow=c(2,2))
hist(x,prob=T) #you will see a bimodal shape
Fn=ecdf(x); fit=smooth.spline(x,Fn(x))
plot(fit,type="l")
u=predict(fit)$y
hist(u,prob=T) #you will see uniform shape
z=qnorm(u)
hist(z,prob=T) #you will see a normal shape
lines(sort(z),dnorm(sort(z)))
ks.test(z,"pnorm") #formally tests if transformed data is normal
#####Copyright (2015): Sujit K. Ghosh###############
A sample output is attached where the data is generated from a mixture of normal distribution.