I am setting up a research database where human subject names are anonymized and stored. In order to do post-hoc queries to check if the same subject participated in multiple studies, we need to control the data entry of names as much as possible. For example: full names in place of shortened version; what to do with hyphenated names? How to enter accented characters from german, french, spanish, etc.? Should names be entered all-caps, all lower case, or first letter capitalized?
Any of these variations could lead to mismatch of name values, making it near-impossible to determine that the same person was entered multiple times.
Does the NIH or ISO or other institution provide a technical guideline for this sort of thing?