I have a medical longitudinal retrospective dataset, records between the observation period of 2000 and end 2016. For many reasons not every medical record spans that entire time-frame, e.g. the patient may have died, or they may have transferred in to the study half way through or transferred out at some stage.

A particular event (or exposure) is seen as a clinical event e.g., going to the doctor and saying or being told that you have a particular disease, e.g., a chest infection. That patient will also have a categorical variable to indicate whether they are a smoker or not.

I wish to count the frequency of chest infections per patient and distribute them over whether they smoke or not. I can imagine this would be a box plot with UQ and LQ being defined, frequency of disease on the Y, and a Smoke YES and NO on the X. This would be very easy to do. The problem I have though is that I am not sure how I deal with medical records of varying length. Surely there is bias if a smoker vs. non-smoker both have twenty chest infections, but there is a four year medical record difference?

Thanks

More Anthony Nash's questions See All
Similar questions and discussions