hi,

In large corpus which contains too many noisy and irrelevant data. How we can detect relevant and important words from that corpus?

For instance, in a movie reviews corpus there will be loads of irrelevant words or aspects which will be irrelevant to the movie domain. Therefore, what are the ways to filter out these irrelevant aspect/words.

By searching, I have found that word2vec builds a vocabulary for a specific domain and discovers similarity between words. Another method is Semantic Web for extracting relevant information. Please guide me the convenient way of filter word irrelevant words

More Jibran Mir's questions See All
Similar questions and discussions