09 February 2016 12 961 Report

Hello,

I have a data which includes 8 variables:

1- The first variable contains grades given by a voice expert to participants' voices. It takes values from 0 to 3 with mode 0 (so zero inflated)

2- The rest of variables are outputs from machine and some of them are highly correlated.

The aim is to find a model of the first variable on the rest 7 variables so that an inexpert person can find the grade of voice by puting only the outputs from machine in the model.

So far I have found that I can use Tweedie glm. Is this suitable for the aim of this study? or there are better options?

Some variables are highly correlated but there is no preference of one over the others; We don't know which one of them are more appropriate to keep. Is there any function like stepwise in linear regression that can help?

I have softwares R and STATA available.

Many thanks for any suggestion and advice.

Azita 

More Azita Rajai's questions See All
Similar questions and discussions