I have implemented a sentence similarity method using WS4J.

I have read about sentence similarity in articles which is based on word similarity in two sentences. But I couldn't find a method which computes and returns a single value for the overall sentence similarity based o the word similarities.

A similar question was asked at Stack overflow website at "sentence-similarity-using-ws4j"

As you can see I have managed to code with WS4J up to the extent where any word in sentence a finds a synset match in the other sentence (and the matching value is above 0.9) returns a match message. But this is not a good approach I guess.

I have found the article by Yuhua et [2]. all very useful but cannot figure out the method they used for overall sentence similarity.

public static String sentenceSim(String se1, String se2, RelatednessCalculator rc) {

String similarityMessage = "";

String similarityMessage2 = "";

if (se1 == null || se2 == null) {

return "null";

}

if (nlp == null) {

nlp = OpenNLPSingleton.INSTANCE;

}

// long t00 = System.currentTimeMillis();

String[] words1 = nlp.tokenize(se1); // base

String[] words2 = nlp.tokenize(se2); // sentence

String[] postag1 = nlp.postag(words1);

String[] postag2 = nlp.postag(words2);

String u = "";

int matchCount = 0;

int counter = 0;

String mLC = rc.toString().toLowerCase();

for (int j = 0; j < words2.length; j++) { // sentence

String pt2 = postag2[j];

String w2 = MorphaStemmer.stemToken(words2[j].toLowerCase(), pt2);

POS p2 = mapPOS(pt2);

// System.out.print(words2[j]+"(POS "+pt2+")");

for (int i = 0; i < words1.length; i++) { // base

String pt1 = postag1[i];

String origWord1 = words1[i];

String origWord2 = words2[j];

String w1 = MorphaStemmer.stemToken(words1[i].toLowerCase(), pt1);

POS p1 = mapPOS(pt1);

String popup = mLC + "( " + w1 + "#" + (p1 != null ? p1 : "INVALID_POS") + " , " + w2 + "#"

+ (p2 != null ? p2 : "INVALID_POS") + ")";

String dText;

// boolean acceptable = rc.getPOSPairs().isAcceptable(p1, p2);

// ALL WORDS FROM BASE HAS TO MATCH - IF ONE DOESNT,

// THEN ITS NOT MATCH

double d = -1;

if (p1 != null && p2 != null) {//

double r = wordSim(w1, w2, rc);

if (r > 0.9) {

matchCount++;

similarityMessage += "\t\t Similarity Found (Base : sentence) ('Base Word: " + origWord1 + "=" + w1 + " "

+ p1 + "', Sentence Word: '" + origWord2 + "=" + w2 + " " + p2 + "') = " + r + "\n";

System.out.println(similarityMessage);

}

}

}

// System.out.println();

}

// output if all words in sentence 1 have found matches in sentences 2

if (matchCount == words1.length) { 

similarityMessage2 = "\t\tFound all matches for base in sentence: ";

System.out.println("\t\tBase " + se1);

System.out.println("\t\tFound all matches for base in sentence: ");

System.out.println(similarityMessage);

}

similarityMessage = "";

return similarityMessage;

I have done my codes in Java, so I was looking for some java implemetations.

[1]: Li, Y., McLean, D., Bandar, Z. A., O'shea, J. D., & Crockett, K. (2006). Sentence similarity based on semantic nets and corpus statistics. Knowledge and Data Engineering, IEEE Transactions on, 18(8), 1138-1150.

More Pankajeshwara Sharma's questions See All
Similar questions and discussions