21 November 2016 3 7K Report

Hello,

I am looking for a suitable estimate of semantic similarity (between two words) which is based on comparison of two binary vectors (each word has a vector of "0s" and "1s" where each 0 or 1 represents association or dissociation with/from a semantic feature).

E.g. (the 10 columns represent 10 semantic features)

Word A: 0-0-0-0-0-1-1-1-1-1

Word B: 0-1-1-0-0-0-0-1-1-1

The problem with the common metrics (such a correlation) is that if the both words are dissociated from a feature (e.g. first column = 0 for both), this is counted as "match", so the similarity is increased. An extreme case, if the two vectors are composed of 0s only, the correlation is r = 1 even thou there is no overlapping semantic feature.

So, I would need a measure of similarity that increases with matching features (a column for both words is 1),decreases with mismatching features (one word has 1 and the other 0 in a column or vice versa), and does nothing when both words do not match to a semantic feature (two 0s).

Thank you for any advice!

Martin

More Martin Marko's questions See All
Similar questions and discussions