What are the metrics used to evaluate keywords extracted from documents?

Here is an extract from my draft ppt on "Art of searching and science of retrieval". See the image for formula. Will it make any sense? if not, forgive me.

Term Analytic:

• TF (Term Frequency): the number of occurrences of the term in the document excluding the stop words; More the occurrence of a term in the document more relevant it is to a query with that term

• DF (Document Frequency): the number of documents in the collection that contain a term

• CF (Collection Frequency): the number of occurrences of the term in the entire collection

• It is better to use document level scoring like DF than collection level statistics like CF

• DF and CF tend to rank the document with very high for frequently occurring common words and hence do not serve the purpose of retrieving the more precise one; Hence ITF (Inverse Term Frequency) & TW (Term Weighting) are used

• There is a need to change and rewrite the values of terms occurring in more than 50% of documents in the collection with 0 values and made them as ‘stop terms’ and documents with more than 50% of the index terms also receive a similar treatment

• Scaling down the weights with high CF is necessary

• The value/ weight of term that grows with its CF is reduced by a factor and that is IDF (Inverse Document Frequency)

• The formula used for the purpose is IDFt = Log N / DFt where N is the number of documents in the collection, DFt is the document frequency of term t and IDFt is the inverse document frequency of term t

Joseph Dubrovkin

Hi, look articles on bibliometrics. Very nice question!

Taiguara Villela

Hello Mustapha, how are you?

The archival documentation , ie with probationary status , need archival processing . What documentation do you work?

Mustapha Bouakkaz

Hello Villela

scientific articles

Dear Sridhar!

Very good answer!

Dear Sridhar! Thank you for your answer

Part 1 of this ppt is uploaded on slideshare at https://www.slideshare.net/mssridhar/art-of-searching-and-science-of-information-retrieval

Part 2 and 3 will be uploaded shortly!

What are the best academic conferences for social network analysis?

Does anyone have an implementation of one of the four algorithms Pascal, Close, MaxMiner or Apriori?

Can someone introduce to me some important keywords extraction criteria ?

Do you know a journal that i can publish my research in keywords extraction

Can anyone suggest approaches that can help me to aggregate keywords extracted from a corpus of documents?

Is there a problem with my RNA pellet?

RNA Extraction Using Hot Borate Method No Longer Working?

How to convert a privately loaded document into a public document?

Low-yield gel extraction problem?

Do you have good tips for seaweed tissue preservation in the field for post RNA extraction?

The question is how to use Wavenet transform?

Can i use the protease inhibitors during cell membrane vesicle preparation ？?

Are the apoptotic cells is positive for γH2AX ?

Pink bacterial colonies?

How should we increase the quality of RNA extraction?