I have a numeric output variable and a numeric predictor in a small sample size. For example, my output variable is percentage of domestic abuse per each state and my predictor is the percentage of alcohol abuse per state.

I have multiple predictor variables similar to the one above. In a linear model, some are significant while others are not. When I convert them to binary variables based on the median across my sample size (50 states in my example) some seize to be significant while others become significant when previously they were not.

In terms of interpretation, a numeric alcohol abuse causing domestic abuse makes sense, but also, a binary "high" alcohol abuse (or alcohol abuse above median) causing domestic abuse makes sense. My sample size is small due to the context of the problem, so binary variables of high and low usage do seem to give me more power. However, other variables, when converted to binary, they lose their significance in the linear model.

When is it methodologically correct to convert numeric variables to binary? Does it make sense from a point of simplifying interpretation in a limited sample size?

More Michael Tsikerdekis's questions See All
Similar questions and discussions