20 April 2019 3 7K Report

Hi guys, I'm using a computational tool called Coh-Metrix, which can report 106 indices, which belong to 11 banks of linguistic features of a written text.

The texts I am analyzing are officially rated essay from grade 1 (lowest) to 5 (highest), however, the essays collected are divided by the grades, so I do not have the scores.

I want to find out which indices are significantly contributed to each essay grade (1-5). It seems that Ordinal Logistic Regression is a suitable model for my analysis, because the dependent variables (i.e. essay grades) are ordered factors and the independent variables are numerical.

However, I wonder do I need to run a Pearson's correlation test before the Ordinal Logistic Regression to find out which index is the most correlated in each bank of linguistic feature. Since, for example, the indices of word count, paragraph count and sentences count would probably be highly correlated to one another. In this case, i can just pick one of them and drop the rest to avoid collinearity.

I am a newbie to statistics and Rstudio so I might sound dumb.

But, I'm really desperate for help now.

Any help will defintely be appreciated! Thank you.

Similar questions and discussions