When we take a look at the top 100 words from Zipf's Law the first few are:
the, of, and, to, a, in, is, that, was, it, for, on, with, he, be, I.
Now we can see why these words appear the most. Because defining any given word requires circular reference. If you want to define one of these words, you'll need to go into at least some of the others, and then eventually some of them come back:
https://www.merriam-webster.com/dictionary/the
Self reference is similar to how Domain Authority works on Search Engines used in modern SEO Marketing.
Domain Authority is determined by the Page Rank algorithm defined by Larry Page, which looks at the authority of backlinks to a page and then cumulative for the domain.
If you look here:
https://linkdepartment.com/pagerank-vs-domain-authority/
There is a large relative lack of insight when it comes to Page Rank and Domain Authority. They are both defined by each other.
https://patents.google.com/patent/US6285999B1/en
You need page rank to define domain authority, and you need domain authority to define page rank. That is why Google will always have the max domain authority. And yet all they do is link to other sites. This means in the beginning it was decided who and what would get the first initial highest rank. Because of their rank they were granted an initial domain authority, and then as the depth of links on pages that were linked linked back to other pages, that is what boosted their rank. So as the depth increased and the index updated routinely, the pages would re-sort according to their new rank.
Now they talk a lot about domain authority being required for the ranking, but if you look here:
https://www.statista.com/chart/16478/domain-backlinks-website/
For one, the authority is relatively unimportant compared to the number of backlinks. Because if Google has the highest domain authority, and the highest number of backlinks, ALL of those backlinks have less domain authority than google, so if you looked at the distribution of authority over the number of backlinks, then google would be the worst performer. And if you looked at the number of outbound links compared to backlinks, again, Google is the worst performer, because it shows that they actually know nothing and contribute nothing, all they are is a doorway to somewhere else, and they've optimised for growth (aka cancer), rather than pleasure, or knowledge creation, or novelty, or the generation of new disciplines that don't depend on any priors (like a scientific paper of those who reference nothing but are referenced everywhere, or some computer source code that has 0 dependencies and yet is integrated into all other code, devices etc etc.).
So someone else has said the same before:
https://blogs.cornell.edu/info2040/2018/11/17/zipfs-law-the-internet-and-cascades/
And with code, the singularity of self-reference is multiplicitive being the Ouroboros-quine, the cellular automata, self-generating AI, and many cryptocurrency miners install code that installs code that installs code, until it re-installs itself and uses that to FORCE "consensus" ie you cannot alter the code and participate in mining on the chain.
So, if we take the first point I made about defining any given word, the value of looking at self-referencing ideas is ultimately useless. What you want to do is look at the distribution of the decay ratio of referencing. As in, for words this would be: when you look at a given word what are it's connotations, synonyms, what other words does it reference in being defined, now take all of those words and do the same, now how many of those link back? You can do this pretty quickly by looking up a word and taking the easy route and looking at it's synonyms. Click on a synonym and see if the original word is listed under that word.
Now I want to extend and combine a couple of ideas that haven't changed in a long time. When you look at a given word's synonyms, you generally see that each group of synonyms is grouped under a single-word category 'defining' the meaning branch. But if we used single words to define single words we would never leave a single word. So I think we should list synonyms under each listed definition of the original word. They are one and the same. And when you list the synonym, you should also list which definition it has when it is listed as a synonym. Our dictionary/thesaurus and their website counterparts haven't progressed in a long time, and I think this would be a good next step.
In Western Culture we say "less is more", and call it a "minimum viable product" to launch a business, and yet in Chinese Culture they say "more is more", because they have thousands upon thousands of individual characters, and the significant majority of them are untranslated, they have 4 different styles of chinese and 5 different scripts of each style, when they would win with just 1. We think of ourselves as the best of the best because of GDP and GDP/capita, and yet GDP is extracted profit, not provided value, when if you want to manufacture a product at scale with automated high-throughput machinery (industrialisation) then look no further at the price per machine in China. Again, not only have they had the lowest inflation, the lowest CPI over the years, but they are the only currency with a non floating FOREX value.
So I say, the more we can add that is good to what we have and quicker, the better. "Back-out link-ratio", and "the distribution of the decay ratio of referencing" and "multi-defined synonym categorisation" are the things I offer you today.