Hello

I have the following situation: I have a paper X about topic Y. For paper X I did a forward search with Web of Science (checking all new papers which cite paper X). Then I have downloaded all articles I have identified via forward search (approx. 1'000 Papers). Now I would like to sort these papers according to the frequency of specific keywords used.

For example: I have found paper Z via forward search (so paper Z cites paper X which is about topic Y). Now I want to check if paper Z is also concerned about topic Y or if it just refers to it in passing. For that I search for specific keywords which correspond to topic Y. According to the frequency of the specific keywords mentioned in paper X, I want to classify it in the category "relevant" or "not relevant". Now, how can I determine the threshold for the keywords? That is, if paper X only uses the specific keyword once it is most probably not relevant to topic Y. But if it mentions the specific keyword 20 times it is probably relevant for topic Y.

Is there a recognized methodology to determine or approximate a threshold for the keyword frequency which allows to distinguish if a paper is relevant to topic Y or not?

With this approach I hope to reduce the 1'000 papers to those which are about topic Y.

More David Redaschi's questions See All
Similar questions and discussions