15 March 2021 7 2K Report

Hello! In my last question, I asked a question about how to analyze a dataset with an ordinal dependent variable and multiple categorical independent variables. Here's the question if you'd like to check it out: https://www.researchgate.net/post/How_can_I_analyze_multiple_binary_independent_variables_against_an_ordinal_dependent_variable

My dataset is questionnaire data that has a field about skill level in a certain sport. This skill level is the target variable in the study. The questionnaire also had a question about which of four sports the respondents answers regard. The aim is to compare if the correlations between skill level and answers are similar or not in each sport. So I would like to find which variables predict skill level the best in each sport, and how important the variables are in the prediction.

It was suggested that I use ordinal logistic regression, and test for proportional odds assumption. If the proportional odds assumption is not met, I should use consecutive binary logistic regressions to construct an ordinal model myself. It was also suggested that I could use a Boosted regression tree. I would like to use these both as a cross validating method, as there seem to be uncertainties in ordinal logistic regression.

I understand that the workflow should be as follows:

  • Make ordinal logistic regression
  • Check for proportional odds assumption
  • Regardless of whether or not the assumption is violated, create binary logistic regressions to study the details of the data, and also to check if the ordinal logistic regression model did indeed provide an accurate summary of the correlations in the data
  • Train a boosted regression tree and find the importances for each independent variable. Use as cross validation method.
  • The binary logistic regressions should be run as follows: Class 1 vs Class 2-4, Class 1-2 vs Class 3-4, Class 1-3 vs Class 4. I understand this and know how to do this, but I do not know if the boosted regression tree should be done in the same manner. Should I make three different boosted regression trees and calculate the importances separately, or should I only create one tree model that I train with all four target classes at once? It seems boosted regression trees don't perform well with target variables with more than two values.

    I would truly appreciate your help. Also, if you know of studies that have used a similar method, I would really appreciate if you could link them.

    Best regards,

    Timo Ijäs

    More Timo Ijas's questions See All
    Similar questions and discussions