I have a list of 100 cases with age and gender, and a list of 150 controls.

I want to match the cases to controls based on gender and age +-3 years.

Currently I'm using a greedy algorithm that tries to find first exact matches, then matches +-1year, +-2year, +-3years. However, this is not optimal way, and I could potentially get more matches with a non-greedy matching within the constrained +-3 years. I would prefer to optimize for maximum number of matches within +-3y and not optimize for minimum age difference.

Which algorithm/package can I use for this problem? (preferably in Python, but must be open-source)

More Simon Kern's questions See All
Similar questions and discussions