Howdy, it's late 2019 and I'm looking for a recommendation of software for text analysis. I'm particularly interested in sentiment analysis, hierarchical clustering and other frequency-based methods. Bests,
I can recommend spaCy, Gensim and NLTK if you use python.
spaCy is a python library, which is specifically designed for production use. There are many pre-trained models, for example, for Part-of-speech tagging and Named Entity Recognition. https://spacy.io/
Gensim is a python library and offers, for example, implementations of word embedding and topic modeling approaches. It is more for research than for applications I would say. https://radimrehurek.com/gensim/
NLTK is actually a suite of python libraries for text processing. They support many corpora and trained models. I would say NLTK is more application-oriented than Gensim but less application-oriented than spaCy. http://nltk.org/
If your question is specifically about sentiment analysis, there is the small python library VADER for exactly that. https://github.com/cjhutto/vaderSentiment
The Eidos-X++ system differs from other artificial intelligence systems in the following parameters:
- was developed in a universal setting, independent of the subject area. Therefore, it is universal and can be applied in many subject areas (http://lc.kubagro.ru/aidos/index.htm);
- is in full open free access (http://lc.kubagro.ru/aidos/_Aidos-X.htm), and with the relevant source texts (http://lc.kubagro.ru/__AIDOS-X.txt);
- is one of the first domestic systems of artificial intelligence of the personal level, i.e. it does not take special training in the field of technologies of artificial intelligence from the user (there is an act of introduction of system "Eidos" of 1987) (http://lc.kubagro.ru/aidos/aidos02/PR-4.htm);
- provides stable identification in a comparable form of strengh and direction of cause-effect relationships in incomplete noisy interdependent (nonlinear) data of very large dimension of numerical and non-numerical nature, measured in different types of scales (nominal, ordinal and numerical) and in different units of measurement (i.e. does not impose strict requirements to the data that can not be performed, and processes the data that is);
- contains a large number of local (supplied with the installation) and cloud educational and scientific applications (currently 31 and 196, respectively) (http://lc.kubagro.ru/aidos/Presentation_Aidos-online.pdf);
- provides multilingual interface support in 44 languages. Language databases are included in the installation and can be replenished automatically;
- supports on-line environment of knowledge accumulation and is widely used all over the world (http://aidos.byethost5.com/map5.php);
- the most time-consuming computationally, the operations of the synthesis models and implements recognition by using graphic processing unit (GPU) that some tasks can only support up to the solution of these tasks is several thousand times that really provides intelligent processing of big data, big information and big knowledge;
- provides transformation of the initial empirical data into information, and its knowledge and solution using this knowledge of classification problems, decision support and research of the subject area by studying its system-cognitive model, generating a very large number of tabular and graphical output forms (development of cognitive graphics), many of which have no analogues in other systems (examples of forms can be found in: http://lc.kubagro.ru/aidos/aidos18_LLS/aidos18_LLS.pdf);
- well imitates the human style of thinking: gives the results of the analysis, understandable to experts on the basis of their experience, intuition and professional competence.
References to the work of Professor E. V. Lutsenko at the ask-the analysis of texts
Ask-the analysis of the texts allows:
- form generalized linguistic images of classes (semantic cores) based on fragments or examples of related texts in any language;'
- quantitatively compare the linguistic image of a particular person or description of an object or process with the generalized linguistic images of groups (classes);
- compare generalized linguistic images of classes with each other and create their clusters and constructs;
- to investigate the simulated subject area by studying its linguistic system-cognitive model;
- to carry out intellectual attribution of texts, i.e. to determine the probable authorship of anonymous and pseudonymous texts, Dating, genre and semantic orientation of the content of texts;
- all this can be done for any natural or artificial language or coding system.
Lutsenko E. V. Synthesis of semantic cores of scientific specialties of the higher attestation Commission of the Russian Federation and automatic classification of articles on scientific specialties with the use of ask-analysis and intellectual system "Eidos" (on the example Of the scientific journal of Kubgau and its scientific specialties: mechanization, agronomy and veterinary medicine) / E. V. Lutsenko, N. V. Andrafanova, N. V. Potapova / / Polythematic network electronic scientific journal of the Kuban state agrarian University (Scientific journal of Kubgau) [Electronic resource]. - Krasnodar: Kubgau, 2019. - No. 01(145). Pp. 31-102. – IDA [article ID]: 1451901033. - Access mode: http://ej.kubagro.ru/2019/01/pdf/33.pdf, 4.5 C. p. l.
Lutsenko E. V. Formation of the semantic core of veterinary medicine by Automated system-cognitive analysis of passports of scientific specialties of the HAC of the Russian Federation and automatic classification of texts in the areas of science / E. V. Lutsenko / / Polytematic network electronic scientific journal of the Kuban state agrarian University (Scientific journal of Kubgau) [Electronic resource]. - Krasnodar: Kubgau, 2018. - No. 10(144). P. 44 – 102. – IDA [article ID]: 1441810033. - Access mode: http://ej.kubagro.ru/2018/10/pdf/33.pdf, 3,688 C. p. l.
Lutsenko E. V. Intellectual binding of incorrect references to literary sources in bibliographic databases with the use of ask-analysis and the "Eidos" system (on the example of the Russian scientific citation index-RSCI) / E. V. Lutsenko, V. A. Glukhov / / Polythematic network electronic scientific journal of the Kuban state agrarian University (scientific journal of Kubgau) [Electronic resource]. - Krasnodar: Kubgau, 2017. - No. 01 (125). C. 1 – 65. – IDA [article ID]: 1251701001. - Access mode: http://ej.kubagro.ru/2017/01/pdf/01.pdf, 4,062 C. p. l.
Lutsenko E. V. Application of ask-analysis and intellectual system "Eidos" for the solution in General form of the problem of identification of literary sources and authors by standard, non-standard and incorrect bibliographic descriptions / E. V. Lutsenko / / Polytematic network electronic scientific journal of the Kuban state agrarian University (scientific journal of Kubgau) [Electronic resource]. - Krasnodar: Kubgau, 2014. - No. 09 (103). S. 498 – 544. – IDA [article ID]: 1031409032. - Access mode: http://ej.kubagro.ru/2014/09/pdf/32.pdf, 2,938 C. p. l.
Lutsenko E. V. ask-analysis of the problems of articles Of the scientific journal Kubgau in dynamics / E. V. Lutsenko, V. I. Loiko / / Politematic network electronic scientific journal of the Kuban state agrarian University (Scientific journal of Kubgau) [Electronic resource]. - Krasnodar: Kubgau, 2014. - No. 06 (100). P. 109 – 145. – IDA [article ID]: 1001406007. - Access mode: http://ej.kubagro.ru/2014/06/pdf/07.pdf, 2,312 u. p. l.
Lutsenko E. V. Attribution of anonymous and pseudonymous texts in system-cognitive analysis / E. V. Lutsenko / / Politematic network electronic scientific journal of Kuban state agrarian University (Scientific journal of kubgau) [Electronic resource]. - Krasnodar: Kubgau, 2004. - No. 03 (005). Pp. 44 – 64. – IDA [article ID]: 0050403003. - Access mode: http://ej.kubagro.ru/2004/03/pdf/03.pdf, 1,312 C. p. l.
Lutsenko E. V. Attribution of texts as a generalized problem of identification and forecasting / E. V. Lutsenko / / Politematic network electronic scientific journal of Kuban state agrarian University (Scientific journal of kubgau) [Electronic resource]. - Krasnodar: Kubgau, 2003. - No. 02 (002). P. 146 – 164. – IDA [article ID]: 0020302013. - Access mode: http://ej.kubagro.ru/2003/02/pdf/13.pdf, 1,188 C. p. l.
I prefer MALLET ( http://mallet.cs.umass.edu/) or the text analysis packages (library tidytext, library dplyr, library topicmodels, etc.) available in R.
If you are more looking for legacy software rather than a bunch of libraries, Expert Systems Cogito or Microfocus IDOL (ex. Autonomy/HP) might be worth a look. In the past, I used to work a lot with the IDOL stack. However, both have their limitations but you save a lot of time when it comes to operationalizing your analytics solution.
I am using NLTK in Python because it has own functions and also supports Stanford NLP engines. I also tested GATE. Unfortunately, unsuccessfully. If you need statistical features, you can use WEKA both in Python or in Java.
python's packages are the best tools for text mining and text analysis. if you work natural language processing you can use "NLP", "sklearn" or "NLTK".
there are many websites for this subject but I like ( https://towardsdatascience.com/ ) and ( https://medium.com/ ) websites, I hope you enjoyed it. Furthermore, there is Orange software that so useful for data visualization, machine learning, text mining, and data mining.