Dear researchgate community.

We are currently undertaking a large historical cohort study (N = 3500 records), where we will, among other things, register all hospital admissions and the 3 first ICD-10 codes from the medical reports.

Data entry will be done in SPSS, but several later analyses will be done in Stata/R and others (genetic platforms). The coding scheme in ICD-10 starts with a letter (i for cardiovascular), a main number, and final numbers; I50 is heart failure, I50.9 is unspecified heart failure. This can perhaps most easily be coded as a string variable, as it is. However, this string code might perhaps generate complications later on, moving between statistical platforms, converting files, etc? A more elaborate approach could be to generate a 2-3 variable coding scheme for the letter, the main number, and the sub-number, where all could be in numeric form (a = 1, b = 2, etc). However, this would put some burden on the persons doing the data entry (including me). Does anyone have any experience with data entry coding schemes, using ICD-10 codes in semi-big cohorts and the potential pain they might cause later, moving across statistical platforms?

Greatful for any input

Lasse

Similar questions and discussions