Is cosine similarity a suitable measure to extract answers to questions?

More Bolanle Ojokoh's questions See All

Can someone help to suggest a good journal for a question answering review study?

I need the suggestion of a good journal to publish our review work in qa

06 July 2018 8,402 2 View

Can anyone assist to suggest relevant materials for nutritional recommendation?

We intend to build a system that provides a recommendation of diet in the correct proportion of nutrients while minding cost and dietary interests/constraints of individuals especially in a low...

08 September 2016 5,065 0 View

Is there an English Translation of the Carl Moller text: ZUR VERGLEICHENDEN ANATOMIE DER SILURIDEN?

I recently came across an anatomy text by Carl Moller that was published in 1915 but it is in German or Dutch neither of which I can understand. I would like to know if there is an English...

10 August 2024 4,347 1 View

Do you know best mines of western part of Afghanistan?

I want to know more about Mn deposits in west of Afghanistan.

07 August 2024 3,427 1 View

How to convert a privately loaded document into a public document?

I attempted to make a privately uploaded text public but a window appeared that said an error occurred. There was no explanation provided as to why there was an error or what might be done to...

05 August 2024 8,025 7 View

How does one derive the standard deviation of a scale?

Dear all, I am working on analyzing data from a survey on student satisfaction. The survey contains items with a 7-point Likert response format that produce 12 scales related to different areas...

05 August 2024 2,141 4 View

Why results of ROS flurescence are negative as there was no bacteria within?

Hello. I am working on ROS production of two systems: system A is cerium oxide and hydrogen peroxide, system B is cerium oxide nanoparticle, hydrogen peroxide and potassium bromide. I did some...

04 August 2024 5,974 3 View

How to change the version of the article full-text pdf file?

How to change the displayed full article text to its corrected version? In the file on the page of the journal where I published the article, there was an error in the text, the table is...

30 July 2024 3,229 2 View

I have no added any resarch paper yet but showing three paper? how to delete it?

I have not addede any paper yet but it has selected 3 papers which are not mine in my account. i want to delete that information. please help me

30 July 2024 3,743 0 View

Can anyone please provide me the full text article of this clinical Trial?

Roflumilast Cream Improves Signs and Symptoms of Plaque Psor...

29 July 2024 5,250 0 View

Are you looking for research collaboration ?

we have few papers ready for submission, and we need one co-author for each article who can pay article fee. Interested authors may text here or contact me on my following email id [email protected]

29 July 2024 6,626 0 View

How can productivity (using the Google form link below to provide your answers) be achieved in manuscript publication?

Survey on Productivity in Journal Manuscript Publication Survey Form Link: (https://forms.gle/YRVrn8dL4WZJJ79S8 ) Dear Researcher, We kindly invite you to participate in our survey focused on...

29 July 2024 4,116 1 View

Piotr Przybyła

Generally, it depends on what role the measure plays in the whole system. In the most typical architecture, where it is used to compare a question to possibly relevant passages, the problem is that they are most likely going to be several times longer than the question. It may affect results, as cosine similarity uses vectors of equal length to represent both elements. An alternative may be Jaccard similarity, which has been designed to measure similarity of sets, ignoring their sizes. What is more, you can easily extend it by taking into account weights of matched elements (e.g. co-occurrence of word "Toronto" is more informative than "to"). Other possible direction would be to concentrate on words of question only, for example by counting what percentage of them occur in tested passage.

However, despite these problems, the cosine similarity is in fact commonly used in QA systems. To definitely judge which one is better in this application one would need to test different measures in a single system - I haven't seen such experiments.

Piero Molino

Piotr you may find some related experiments in few of my papers, in particular "Distributed representations for Semantic Matching in non-factoid Question Answering". Hope it can be interesting for you.

To answer the original question: yes cosine similarity can be a suitable measure, but it all depends on how the vectors you calculate the similarity of are constructed. Moreover, for a high accuracy QA system several different measures should be combined.

Conference Paper Exploiting Distributional Semantic Models in Question Answering

Article Playing with knowledge: A virtual player for “Who Wants to B...

Conference Paper Distributed Representations for Semantic Matching in non-fac...

Safwan Shatnawi

Hello,

I suggest you to use LSA based text similarity. LSA find text similarity beyond lexical similarity. We used different similarity measure to test text to text similarity among them LSA was the most efficient tool.

Check the following URL for SEMILAR APLI : http://www.semanticsimilarity.org/

Cosine similarity can not detect semantic similarity.

kind regards

I will give a direct example to illustrate the differences between

syntax and semantic

the words "user" and "human" are not similar in term of syntax similarity

however often we use user and human interchangeably as a result they

are semantically similar.

semantic similarity goes beyond syntax similarity of the words

Ramon López de Mántaras

I agree with Safwan. If lexical similarity is enough for your purposes then Cosine is OK, but if you need to take into account semantics Cosine is not useful. At my lab we did some work on semantic similarity using an extension of Jaccard measure. Obviously, to deal with semantics you need a semantic relational representation of the words, for instance a semantic network representation capturing that "user" is a subset of "human".

Bolanle Ojokoh

Thanks to all for the very informative answers