Hello everyone,

I'm looking for a good algorithm to find similarity between documents.

I've cleaned up the text, performed summarization and extracted the most important words through tf-idf, entity extraction, and took those that are related to the previous one (NER) as well. The next step was stemming to get the final form. Which algorithm would you suggest to be the best for this? I've investigated a little bit and I found these:

1. Classic cosine similarity

2. Word Mover’s Distance

3. GloVe: Global Vectors for Word Representation

4. Siamese Manhattan LSTM (MaLSTM)

Thank you in advance :)

More Emilija Gjorgjevska's questions See All
Similar questions and discussions