I would like to find help in doing a network analysis of a lexical database - an integrated dictionary-thesaurus. I want to find out which words are c...

I have been looking for a methodology for choosing which words were "key" in the dictionary - the Wordsmyth Dictionary-Thesaurus in particular. Since I'm an educator, I also want to be able to select the most important words for vocabulary study. About 2-3 years ago, I came across several articles that led me to think that a network analysis of the dictionary might lead to a method for selecting keywords by their centrality in the dictionary semantic network.

The task is applying an algorithm to find the words that are central, because the concepts they include link to other words as "small world" hubs. This concept comes from two sources: (a) a suggestion in Michael Stubbs' book Words and Phrases, that the keywords in a language are those that are used in defining other words; and (b) an article by Michael Steyvers that shows dictionaries to be "small world" semantic networks, with hubs that link many other words. So, basically, I would like to find the "central' concepts in English.

My algorithmic and statistical abilities are primitive, but here is the way I have formulated the problem.

I want to find the "centrality" of a word in the Wordsmyth dictionary-thesaurus. Our thesaurus is integrated with the dictionary. It links synonyms with each appropriate definition. So my proposed algorithm involved:

For each word, count:

1. number of definitions

2. number of synonyms

4. Words used in definitions: the number of words (and number of definitions) that use this word (in any of its forms) in a definition

5. Synonym/antonym of the word in a definition

7. frequency (based on a corpus, such as BNC or COCA)

The problem is finding the best statistical concepts for giving weights to these factors.

What I'd like to do is to calculate the "centrality" (importance) of words in our dictionary. What we need is a text analysis similar to that done in the attached article - "Network analysis of dictionaries " (Batagelj, Mrvar, and Zaver). My impression is that we will have to do an analysis similar to the Batagelj article.

I have found some software that might be suitable - http://gephi.org. But I am not sufficiently technically trained to carry out the analysis.

I hope that helps to understand what I want to accomplish.

Robert Parks

The problem with low frequency lists - even those that eliminate proper nouns, is that there are many that have low frequency, but there is very little ability to discriminate among them. Most rare words are not at all important. The lower frequency words may make reading difficult. But frequency doesn't identify the "important" words that carry important meanings in the language. An example would be "justice". It isn't so frequent, but the idea of justice is important in the meanings of many words which are themselves important. The "small world" analysis would identify words that may not have a lot of links (and may not be frequent), but the links they have are to words that are themselves important (and may be more frequent). The principle is the same as that used by Google to find important/high-ranking pages. A page is more important for Google if it is linked to by pages which are themselves highly linked to, etc.

How does public perception of vote buying affect voter behavior and participation in Ghana?

How to change reference list (muilitple pages) from APA to Harvard?

Could anyone recommend antibodies to TET2, TET3 and DNMT3B for bovine?

What is the materials composition of automotibles?

How do I change my email address?

Does anyone know where one can buy 8161 glass capillaries for making patch pipettes?

What are the most appropriate variables for measuring profitability of Robusta Coffee production by smallholder farmers using panel data?

How can I get a list of my publications in Research Gate and the number of Reads and other stats?

T test; paired, student or independent?

Preventing loss of fixed cells after washes with PBS?

How can I prepare virus for a TEM or SEM imaging?

How to learn more about SPSS and its Application?

Can I base on reverse DNA sequences to perform alignment, convert to amino acids and GenBank submission?

Baseline drift in HPLC? What causes this?

Text-Communication from the M1 Hand Area using BCI—and then there is Elon Musk?

Handling Missing Data and Building a Predictive Model with Incomplete Information ?

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

Is there an English Translation of the Carl Moller text: ZUR VERGLEICHENDEN ANATOMIE DER SILURIDEN?

Is it possible to use the Fused Deposition Modeling (FDM) to additively manufacture interconnected porous structure generation of >100-200 micrometer?

How can I use the cif data obtained from rietveld refinement extracted via gsas2, for microstructural analysis using ETEX software?