Can you suggest an existing NLP tool for binary classification of paragraphs as appropriate or inappropriate?

*The Eidos-X++ system differs from other artificial intelligence systems in the following parameters:

*- it was developed in a universal setting, independent of the subject area. Therefore, it is universal and can be applied in many subject areas (http://lc.kubagro.ru/aidos/index.htm);

*- it is in full open free access (http://lc.kubagro.ru/aidos/_Aidos-X.htm) and has all the relevant source texts (http://lc.kubagro.ru/__AidosALL.txt);

*- it is one of the first domestic systems of artificial intelligence of the personal level, i.e. it does not take special training in the field of technologies of artificial intelligence from the user (there is an act of introduction of system "Eidos" in 1987) (http://lc.kubagro.ru/aidos/aidos02/PR-4.htm);

*- it provides stable identification in a comparable form of strength and direction of cause-effect relationships in incomplete noisy interdependent (nonlinear) data of very large dimension of numerical and non-numerical nature, measured in different types of scales (nominal, ordinal and numerical) and in different units of measurement (i.e. does not impose strict requirements to the data that cannot be performed, and processes the data that can);

*- it contains a large number of local (supplied with the installation) and cloud educational and scientific applications (currently 31 and 290 (http://aidos.byethost5.com/Source_data_applications/WebAppls.htm), respectively) (http://lc.kubagro.ru/aidos/Presentation_Aidos-online.pdf);

*- it supports on-line environment of knowledge accumulation and is widely used all over the world (http://aidos.byethost5.com/map5.php);

*- it provides multilingual interface support in 51 languages. The language databases are included in the installation and can be replenished automatically;

*- the most time-consuming, computationally, are the operations of the synthesis models and implements recognition using graphic processing unit (GPU) where some tasks can only support up to several thousand times; the solution of these tasks is intelligent processing of big data, big information and big knowledge;

*- it provides transformation of the initial empirical data into information, and its knowledge and solution using this knowledge of classification problems, decision support and research of the subject area by studying its system-cognitive model, generating a very large number of tabular and graphical output forms (development of cognitive graphics), many of which have no analogues in other systems (examples of forms can be found in: http://lc.kubagro.ru/aidos/aidos18_LLS/aidos18_LLS.pdf);

*- it well imitates the human style of thinking: gives the results of the analysis, understandable to experts according to their experience, intuition and professional competence.

*- instead of making almost impossible demands on the source data (such as the normality of distribution, absolute accuracy and complete repetitions of all combinations of factor values and their complete independence and additivity), the automated system-cognitive analysis (ASC-analysis) offers to process this data without any preliminary processing and thereby transform it into information, and then transform this information into knowledge by applying it to achieve goals (i.e. for the management) and solving problems of classification, decision support, and meaningful empirical research of the domain being modeled.

*What is the strength of the approach implemented in Eidos system? The strength is implementing an approach whose effectiveness does not depend on what we think about the subject area or whether we think at all. It generates models directly based on empirical data, rather than based on our understanding of the mechanisms for implementing patterns in this data. This is why Eidos models are effective, even if our understanding of the subject area is incorrect or totally absent.

*And this as well is the weakness of this approach implemented in Eidos system. Models of the Eidos system are phenomenological models, i.e. they do not reflect the mechanisms of determination, but only the fact and nature of determination.

References to the works of prof. E. V. Lutsenko on ASK-analysis of texts

ASK-analysis of texts allows you to:

- to form generalized linguistic images of classes (semantic cores) based on fragments or examples of texts related to them in any language;'

- quantitatively compare the linguistic image of a particular person, or the description of an object, process with generalized linguistic images of groups (classes);

- compare generalized linguistic images of classes with each other and create their clusters and constructs;

- to investigate the modeled subject area by studying its linguistic system-cognitive model;

- to carry out intellectual attribution of texts, i.e. to determine the probable authorship of anonymous and pseudonymous texts, dating, genre and semantic orientation of the content of texts;

- all this can be done for any natural or artificial language or coding system.

Lutsenko E. V. Synthesis of semantic cores of scientific specialties of the Higher Attestation Commission of the Russian Federation and automatic classification of articles by scientific specialties using ASK-analysis and the intellectual system " Eidos "(on the example of the Scientific journal of KubGAU and its scientific specialties: mechanization, agronomy and veterinary medicine) / E. V. Lutsenko, N. V. Andrafanova, N. V. Potapova / / Polythematic network electronic scientific Journal of the Kuban State Agrarian University (Scientific Journal of KubGAU) [Electronic resource]. - Krasnodar: KubGAU, 2019. – №01(145). p. 31-102. - IDA [article ID]: 1451901033. - Access mode: http://ej.kubagro.ru/2019/01/pdf/33.pdf, 4,5 cu. p. l.

Lutsenko E. V. Formation of the semantic core of veterinary medicine by automated system-cognitive analysis of passports of scientific specialties of the Higher Attestation Commission of the Russian Federation and automatic classification of texts in the areas of science / E. V. Lutsenko / / Polythematic network electronic Scientific Journal of the Kuban State Agrarian University (Scientific Journal of KubGAU) [Electronic resource]. - Krasnodar: KubGAU, 2018. – №10(144). p. 44-102. - IDA [article ID]: 1441810033 – - Access mode: http://ej.kubagro.ru/2018/10/pdf/33.pdf, 3,688 cu. p. l.

Lutsenko E. V. Intellectual linking of incorrect references to literary sources in bibliographic databases using ASK analysis and the Eidos system (on the example of the Russian Science Citation Index-RSCI) / E. V. Lutsenko, V. A. Glukhov / / Polythematic network electronic Scientific Journal of the Kuban State Agrarian University (Scientific Journal of KubGAU) [Electronic resource]. - Krasnodar: KubGAU, 2017. – №01(125). p. 1-65. - IDA [article ID]: 1251701001 – - Access mode: http://ej.kubagro.ru/2017/01/pdf/01.pdf, 4,062 cu. p. l.

Lutsenko E. V. Application of ASK-analysis and the intellectual system "Eidos" for solving in general the problem of identifying literary sources and authors according to standard, non-standard and incorrect bibliographic descriptions / E. V. Lutsenko / / Polythematic network electronic scientific Journal of the Kuban State Agrarian University (Scientific Journal of KubGAU) [Electronic resource]. - Krasnodar: KubGAU, 2014. – №09(103). P. 498-544. - IDA [article ID]: 1031409032 – - Access mode: http://ej.kubagro.ru/2014/09/pdf/32.pdf, 2938 u. p. l.

Lutsenko E. V. ASK - analysis of the problems of articles of the KubGAU Scientific Journal in dynamics / E. V. Lutsenko, V. I. Loiko / / Polythematic network electronic Scientific Journal of the Kuban State Agrarian University (KubGAU Scientific Journal) [Electronic resource]. - Krasnodar: KubGAU, 2014. – №06(100). p. 109-145. - IDA [article ID]: 1001406007 – - Access mode: http://ej.kubagro.ru/2014/06/pdf/07.pdf, 2,312 u. p. l.

Lutsenko E. V. Attribution of anonymous and pseudonymous texts in system-cognitive analysis / E. V. Lutsenko / / Polythematic network electronic scientific Journal of Kuban State Agrarian University (Scientific Journal of KubGAU) [Electronic resource]. - Krasnodar: KubGAU, 2004. – №03(005). p. 44-64. - IDA [article ID]: 0050403003 – - Access mode: http://ej.kubagro.ru/2004/03/pdf/03.pdf, 1,312 y. p. l.

Lutsenko E. V. Attribution of texts as a generalized problem of identification and forecasting / E. V. Lutsenko / / Polythematic network electronic scientific Journal of the Kuban State Agrarian University (Scientific Journal of KubGAU) [Electronic resource]. - Krasnodar: KubGAU, 2003. – №02(002). Pp. 146-164. - IDA [article ID]: 0020302013. - Access mode: http://ej.kubagro.ru/2003/02/pdf/13.pdf, 1,188 cu. p. l.

D Lutsenko. S., E Lutsenko.V. Intellectual dating of the text, determination of authorship and genre on the example of Russian literature of the XIX and XX centuries, 2020 // The article is in an open archive. 38 p. – AS DOI: 10.13140/WG.2.2.28824.01281, https://www.elibrary.ru/item.asp?id=43796415

Lutsenko D. S., Lutsenko E. V. Intellectual attribution of literary texts (finding the dates of the text, determining authorship and genre on the example of Russian literature of the XIX and XX centuries), 2020 // Article in the open archive. 9 p – - DOI: 10.13140/RG. 2.2.15349.81122, https://www.elibrary.ru/item.asp?id=43794562

Weichuan Wang

Maybe you can try to use the WMD(Word Mover's Distance) method to fix the problem. I hope this method could help you do that.

Mantas Lukauskas

Rajat Tandon Rajat Tandonhave you tried GPT2, DistilGPT2 from Huggingface?

Artem Kramov

Hi.

Firstly, I'd like to suggest you process an input text using a classical LSTM-based neural network. It may help to omit the problem of the fixed length of input signals. For instance, you can represent the words of a text in a vector form using some semantic embedding model (e.g., ELMo). Then you can pass this set of signals through the LSTM cell; the output value will be represented in a vector form and can be processed by a simple binary classifier (few feedforward layers).

However, I'm not sure that the analysis of the sequence of words will be effective for your task. Instead, I propose to consider a text at the level of sentences. It may help to reveal "bad" text spans while analyzing the whole text. That's why I'd like to suggest the following algorithm:

Split an input text into a set of sentences.

Represent each sentence as a set of vectors using a semantic embedding model.

Pass each sentence through a "Sentence model" that consists of LSTM cells. It may help to represent each sentence as a vector.

Pass obtained sentence vectors through an additional LSTM layer. Then the output vector is processed by a binary classifier (dense layers) providing the probability of the appropriateness of a text.

My research touches on similar problems (binary classification of a whole document) so maybe I will be able to provide you with some close solutions.

Best regards,

Rajat Tandon

Thanks a lot. All of these sound promising to me. Weichuan Wang Mantas Lukauskas Artem Kramov

Artem Kramov , yes it will be great, if you can provide me pointers to the close solutions. My email id is [email protected].

Mohamed Boufenara

I strongly suggest that you use the Word Mover's Distance (WMD) method in order to remedy this problem.

Do you think can be any Uranium bearing rocks in Eastern part of Iran and western part of Afghanistan?

Do you think can be any diamond bearing rocks in Eastern part of Iran and western part of Afghanistan?

What is the difference between mathematical R^4 space and physical 4D unit space?

If Banks do not provide credit facility, what are the options available for FPOs and impact on producer’s income?

Controlling for pupil light reflex when analyzing pupil size time course?

What are a “Farmers Producer Organization” (FPO) and its essential features?

Strugglling with m6A dot blot any suugesstion ?

Do interactions between biosphere, carbon cycle, & water cycle impact global warming & interaction between atmosphere & hydrosphere?

How to get moment output in Abaqus Standart?

How is energy cycled through the Earth's climate system and how do matter cycle and energy flow through the rock cycle?

Feedback defines the constitution of an organism?

Self-Organizing Superorganisms—as envisaged by Nenad Sestan (2018)?

The Bigger You Are, the Harder You Fall (some lessons from Dinosaurs)?

Are air moisture harvesting technologies effective in combating desertification?

Measuring the Intelligence of a Species?

How can i do multivariate Time Series forecast using MLP, ANFIS and LSTM?

The Curse of Evolution and Complexity?

Need help with my research project on open source SIEM and machine learning?

Swimming/space travel depends on the proprioceptive muscle spindles?

What are the limitations and challenges of using machine learning for predicting concrete compressive strength in practical applications?