21 August 2014 11 6K Report

I realize there are some situations where a true dummy coding (0, 1)  is needed to make coefficients interpretable (and situations where you have to do contrast coding, etc), but in terms of general practice and data cleaning/coding, which do you prefer? I can see pros and cons for each. Obviously, the numbers would always increase (i.e., 1, 2 not 1, 0). 

Pros for 0,1,2...: 

1) When binary, you can get a proportion by taking an average

Pros for 1, 2, 3...: 

1) The highest category is the number of categories (if no missing categories), whereas in the other method, you have remember to add 1. 

2) Similarly you avoid confusion when saying "the first category" and "category 1" (which in the other system the latter could be confused as "category labelled 1"). 

Just looking for some tips from practice that folks have used to help keep things clear (particularly when working with data that isn't fully labelled (or in situations in which labels won't display). Thanks!

More Matt Jans's questions See All
Similar questions and discussions