Concerning the language, I can translate from C and C++ if necessary.
I will compare 1 string against hundreds of other strings. The string can be an entire document with a couple of thousand words. The language is not critical, but typically it would be english.
I have developed what I believe is a suitable metric defining the distance between individual documents.