Unsupervised text binary classification Deep Learning?

31 December 2019 3 10K Report

Hello there,

So I am working on a project and I'm kind of confused on how to analyse the problem.

The task is as follow:

I have two text documents from two different periods of time. both documents has target words(like 4 specific words) my task is to see if each of these 4 words has changed its meaning or use over time-either has got a new use or has lost a meaning or use-

for example in the old document the target word 'cell' for example had two meanings either biological cell or a cell chamber and no other uses for cell other than those two, while in the second recent text document the word cell has got a new meaning or use that is cellphone , and by this I would say the 'cell' word has changed meaning over time. on the other hand if the target word's use has remained the same over time I would classify it as unchanged.

So, all I have now are two text documents, 4 target words and I need to use deep neural network binary classify those target words to either changed (1) or not changed(0).

I am a total newbie to deep learning, and think I can do it with a regular python code that would work like this:

1- spot the target word in the document

2- collect all the adjacent words to that word in an array or any other data structure

3- repeat that for the second document

4-compare the two arrays for each word and see if they are different and based on that I would

5- classify the word as changed or not changed

So my question is how would Deep learning make a positive contribution here, am I getting the whole idea wrongly? is it not that easy to classify them upon change on adjacent words?

I would appreciate a light guiding me through this road.

till this point I have learned about tokenization and embedding layer in Keras, and how they are important to transform my text into numbers so the algorithms can work with it. but what is next? how to do the classification thing?

I would say I can tokenize the text to words and then give a label to each distinct word in the document as 0 initially and then input the second document and update the label based on the word pairing in the second document but it feels like immature idea.

what do you think?

Muhammad Ali

A good example related your work is in: https://au.mathworks.com/help/deeplearning/examples/create-simple-deep-learning-network-for-classification.html;jsessionid=7018848c980ee0578db631fd32eb

Somaya Alshare

Thank you l. I will check it out

Samer Sarsam

Hi,

First of all, your task is a classification process; you are applying a supervised learning technique though (not unsupervised, as you stated in your question).

Second, here is the process, in general:

Each document in your data must have a nominal label: changed/not changed. Then, after loading your data, you need to preprocess it using techniques like tokenization. For tokenization, you could assign values/weights like 1 (word appearance) or 0 (word absence) for each extracted feature word. Moreover, length normalization is another preprocessing technique that can be applied. (You can still apply stemming, remove words list, etc., if necessary in your task.). Finally, you can invoke the required machine learning algorithm, e.g., DL. Then, evaluate its performance using the holdout method, for instance. Here metrics like accuracy and ROC can be used. The resulting model will tell whether the target words (in the future text) are original/not original.

Regards,

Dr. Samer Sarsam

Does anyone have the geometric data (chord, twist and pitch ) of Ampair300 wind turbine ?

Does anyone know some Q1, or Q2 journals with fast publication in mechanical engineering, or wind energy modelling?

How can high-resolution images of cells on a 3D matrix be obtained?

How to calculate normal force along a wind turbine blade using CFD-post?

What is the equation used to calculate the wind turbine power ?

How you understand this comment "Still more comprehensive evaluations are needed for journal publication. " ?

Why the wind turbine torque increases as the tip speed ratio increases ?

How to calculate the generated power by an HAWT using Ansys Fluent?

Can I consider the 1st scheme order solution as final solution of the simulation?

How to get a correct power coefficient of an HAWT operating at high rotation speed and low wind velocity?

Feedback defines the constitution of an organism?

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

Request Python code?

Self-Organizing Superorganisms—as envisaged by Nenad Sestan (2018)?

Why does everyone use vs code?

How to convert a privately loaded document into a public document?

Measuring the Intelligence of a Species?

How can i do multivariate Time Series forecast using MLP, ANFIS and LSTM?

The Curse of Evolution and Complexity?

Need help with my research project on open source SIEM and machine learning?