I have a dataset consisting of proportion variables as independent variables. I need to run a linear regression however there is the issue of multicollinearity. I've read that using a centered log ratio transformation can fix the problem but I have no idea how to implement in R. Here's what I've done so far.

#My table

a = data.frame(score = c(12,321,411,511),yapa = c(1,2,1,1),ran=c(3,4,5,6),aa=c(0.1,0.4,0.7,0.8),bb=c(0.2,0.2,0.2,0.1),cc=c(0.7,0.4,0.1,0.1))

library(compositions)

dd = clr(a[,4:6]) #centered log ratio transform

summary(lm(score~aa+bb+cc,a))

summary(lm(score~dd,a))

but I get the same result essentially with the last variable being omitted because of multicollinearity.

There is an alternative that does work if I introduce jitter in the variables aa,bb,cc, however I need something that can directly be implemented in the lm function because I use other variables in my real dataset as well.

library(robCompositions)

lmCoDaX(a$score, a[,4:6], method="classical")

Anyone has any experience with these type of data?

More Michael Tsikerdekis's questions See All
Similar questions and discussions