Hi,

I am a new Ph.D. student , very new to R and I am trying to fit a multinomial logit model to my data, i watched a lot of videos and read different pages and i am confused now, I tried to summarize my questions here:

1-my independent variables are categorial, example job has 18 levels and the diploma has 9 levels, and i have 12 different independent variables, should I convert all of them first to dummy variables? now they are classified to for example 1,2,...18 for the job. what about continuous variables like income? now, they are classified like 0-999 euro to group 1 and so on.

2- in an example, I found that the person first run the model with multiform function and then calculated the p values, and based on p values, he removed those that are not significant simply from the model to have the final model, in my case for example for variable job, job 1,2 are not significant but job7,job11,job18 are significant,i do not know should I remove job from the model?

3-should i start by splitting the data set to training and testing data? in some examples they do and in some cases they do not split the data, if i start with splitting, should i check the model fit with training,g data?

4- about checking the model fit, i continued with the model which i run before without removing the variables that are not significant (I explained in question 2), misclassification in 29 percent,is it a bad model? i also tried to ANOVA test between the only intercept and then by adding the variables one by one, the p values are always under 0.05, which model should i say the best?

should i also check multicollinearity for the multinomial logit model? how? i mean is it needed to have a correlation matrix first?

thank you in advance for helping me

More Sam H.Bahreini's questions See All
Similar questions and discussions