So I'm working on a genetic sequence substring algorithm applied to a random sequence anonymity calculation, and was curious if anyone knew of similar algorithm to that of the K-anonymity algorithm but a little greedier since I'm working with a pool of 96 million identifiers.