Hello,
I am looking for a suitable estimate of semantic similarity (between two words) which is based on comparison of two binary vectors (each word has a vector of "0s" and "1s" where each 0 or 1 represents association or dissociation with/from a semantic feature).
E.g. (the 10 columns represent 10 semantic features)
Word A: 0-0-0-0-0-1-1-1-1-1
Word B: 0-1-1-0-0-0-0-1-1-1
The problem with the common metrics (such a correlation) is that if the both words are dissociated from a feature (e.g. first column = 0 for both), this is counted as "match", so the similarity is increased. An extreme case, if the two vectors are composed of 0s only, the correlation is r = 1 even thou there is no overlapping semantic feature.
So, I would need a measure of similarity that increases with matching features (a column for both words is 1),decreases with mismatching features (one word has 1 and the other 0 in a column or vice versa), and does nothing when both words do not match to a semantic feature (two 0s).
Thank you for any advice!
Martin