Sakshi Sakshi According to the plot, fitting a model with 10–20 themes may be a suitable option. When compared to models with a varied number of themes, the confusion is minimal. The elapsed time for this many subjects is likewise reasonable with this solution.
Subject modeling is a method for determining which set of words (i.e. topic) from a collection of documents best reflects the information in the collection. It may also be viewed as a type of text mining — a method of obtaining recurrent patterns of words in the textual content.
Topic modeling is a type of statistical modeling for discovering the abstract “topics” that occur in a collection of documents. Latent Dirichlet Allocation (LDA) is an example of topic model and is used to classify text in a document to a particular topic.
To decide on a suitable number of topics, you can compare the goodness-of-fit of LDA models fit with varying numbers of topics. You can evaluate the goodness-of-fit of an LDA model by calculating the perplexity of a held-out set of documents. The perplexity indicates how well the model describes a set of documents.There is no one way to determine whether the coherence score is good or bad. The score and its value depend on the data that it's calculated from. For instance, in one case, the score of 0.5 might be good enough but in another case not acceptable. The only rule is that we want to maximize this score.