Here is an extract from my draft ppt on "Art of searching and science of retrieval". See the image for formula. Will it make any sense? if not, forgive me.
Term Analytic:
• TF (Term Frequency): the number of occurrences of the term in the document excluding the stop words; More the occurrence of a term in the document more relevant it is to a query with that term
• DF (Document Frequency): the number of documents in the collection that contain a term
• CF (Collection Frequency): the number of occurrences of the term in the entire collection
• It is better to use document level scoring like DF than collection level statistics like CF
• DF and CF tend to rank the document with very high for frequently occurring common words and hence do not serve the purpose of retrieving the more precise one; Hence ITF (Inverse Term Frequency) & TW (Term Weighting) are used
• There is a need to change and rewrite the values of terms occurring in more than 50% of documents in the collection with 0 values and made them as ‘stop terms’ and documents with more than 50% of the index terms also receive a similar treatment
• Scaling down the weights with high CF is necessary
• The value/ weight of term that grows with its CF is reduced by a factor and that is IDF (Inverse Document Frequency)
• The formula used for the purpose is IDFt = Log N / DFt where N is the number of documents in the collection, DFt is the document frequency of term t and IDFt is the inverse document frequency of term t
Mertics used for evaluate derived from F-score for example F1@k, k means how many keywords would be evaluated. F@O(Omega) also used for evaluate all the keywords , that output of the model.
And some variation of the mentioned metrics, like F1@k, MAP (Mean Average Precision), and etc.
Refer to:
"A position-biased PageRank algorithm for keyphrase extraction | Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence", Dl.acm.org, 2020. [Online]. Available: https://dl.acm.org/doi/10.5555/3297863.3297932.
Z. Sun, J. Tang, P. Du, Z. Deng and J. Nie, "DivGraphPointer", Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, 2019. Available: 10.1145/3331184.3331219
Y. Wang, J. Li, H. Chan, I. King, M. Lyu and S. Shi, "Topic-Aware Neural Keyphrase Generation for Social Media Language", Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2019. Available: 10.18653/v1/p19-1240