Hi, currently I am working on a project where I have to classify texts into different labels. But confused which features must be used to train a Neural Network and what will be the output. Please help me..
To classify text I would prefer fast statistical classifiers (Bayes, Winnow etc.) that are used in spam detection. They take words (or combinations of words) as a token and assign probabilities to them. The probabilities of all tokens of a text determine the class. Training of statistical classifiers is much faster than neural networks.
Could you perhaps clarify what type of labels you are going for?
Some time ago, I worked on a project for sentiment analysis: determine whether a text had a positive or negative connotation. For that project, one of our main cues was the occurrence of certain keywords; in theory, you could apply the same technique to a neural network classifier as well. However, other classifiers would seem more appropriate to me. Unless you can find a small number of high-entropy features, your classifier will suffer from the curse of dimensionality.
You must properly label the text or its sentences based on the importance and train the neural network. For identifying the important features in the text, I advise you to skim through the journal below
To classify text I would prefer fast statistical classifiers (Bayes, Winnow etc.) that are used in spam detection. They take words (or combinations of words) as a token and assign probabilities to them. The probabilities of all tokens of a text determine the class. Training of statistical classifiers is much faster than neural networks.
Perhaps you could define what are the labels that you're looking for in the text. Is it their frequency that you are trying to pin down. Or other sort of classification? Have a look at this: