01 January 2015 7 7K Report

I have a big dataset (repository) which contains  text data. The repository contains a data related to many different domains so that I can search about any domain or general topic.

when I search in my repository about general topic (e.g; nano-technology), I would like to get all sub topics related to that topic. Someone else also wants to search about another topic in the same repository and get the subtopics. What I’m going to do is: developing  a tool which is able to identify the topics for a data which returned by search query.  each time I search, I get different data=>dynamic

I have tried to apply clustering algorithm like k-means, and k-means with Canopy, as well as the topic modeling (LDA). But unfortunately for both the flat clustering like k-means and term-based clustering like LDA, there is no specific way to automatically set the right K (#Clusters, or #Topics).

Is there anyone has any idea how to identify topics in dynamic text? I mean how can I bring the clustering algorithms into real world to work in real application? 

Thanks in advance.

Similar questions and discussions