If I understood well you have two set of words and you want to compare them, so what I suggest is to: first, to establish the distribution law they follow and second to compare found distribution laws, through Kullback-Leibler distance. The first step can be done at token or word level, depends on your goal. To do this at word level you will need a big sample of words and to index each word. If it happens you use python programming language there is the module "collection" that easily allows you to calculate the frequency of words (see for example attached link). Also in python you can find a module to deal with entropy calculations.
I am not sure I understand your question fully. Can you elaborate a bit? As I understand you want to compare distributions for different alphabets. The KL divergence is derived for two distributions over the same alphabet.
If you want to compare different distributions the KL divergence is one option. However, the asymmetries makes it a bit hard to work with. There are other alternatives, and viewing from an information theoretic point I would point at Jensen-Shannon divergence, see e.g.