25 January 2021 5 829 Report

Hello everybody. Im new to STATA and this forum but I hope you can help me.

I am doing a research project where I intend to estimate the prevalence of venous thromboembolic events (vte) up to 30 days post surgery. I have merged two datasets. Dataset 1 contained information about patient characteristics and surgery date. Dataset 2 contained the date of an thromboembolic event ICD-10 code. After merging these to datasets I have created a new variable called 'interval' to estimate the duration between surgery and date of event to see if it occurred 30 days within surgery. Thereafter I created a variable 'vte', where events occurring 30 days, before surgery or non- events == 0.

The problem is that dataset 2 contains multiple observations per patient ID because the ICD-10 code was registered every time the patient had a consultation at the hospital. This means that the same patient occurs more that once in my merged dataset. For example one patient had an event 5 days after surgery but is also registered 87 days after surgery OR one patient had an event 7 days post surgery but also BEFORE surgery.

As my aim is to identify and compare non-events with events within 30 days I want to keep these patients and drop duplicates. The tricky thing is to remove the right duplicates without removing the first observation occurring 1

I hope you understand my question.

Thank you in advance.

Similar questions and discussions