Limitations of character-based seq2seq lstm?

Ervin Kóbor @Ervin_Kobor

09 September 2019 1 516 Report

Dear everybody!

I do a hobby project as creating a character-level seq2seq2 LSTM.

In my task, I give a text as an input (max 40 characters) and the LSTM generates an output that rhymes with the input.

I created very large rhyming rows databases.

At the beginnings I trained my model with the next parameters:

batch_size = 200 epochs = 250 latent_dim = 300 num_samples = 10000

with these parameters my model converged to 0.4 after 75 epoch, but i waited all the 250 epoch and tested that model.

The result wan't so bad, but I wanted more.

After that I tried very large batch sizes, with more than 200k training data (almost all possible parameres) and every result leads to overfitting, that means my model threw the same sentence to every input. BUT(!) after I tried the 250 epoch model, I used checkpoint saving and tested only the best model after it didn't converge more. It stops at 0.29 acc usually.

I know the character level lstm in this task has its own limitations, but it would be really 10k training data?

Is it possible the convergence doesn't matter in this case and the model needs only more epochs?

Is the database too big and has a lot of stopwords and I need to do word-frequency-based filtering on the training data?

I know that the word-level method could be more effective, but I'm afraid of I misunderstood something and I don't want to waste more time to wait results from training until I don't know what I'm doing wrong.

What should I do?

Thank you all.

Chris Morris

There are two methods which can help with overfitting. One is regularization. Is your output sigmoid? If so, activity regularization on the previous layer is helpful. Otherwise, kernel regularization might be a better bet.

The other common method is to add noise: Dropout to a binary layer, or GaussianDropout to a layer with continuous values. If you are using Keras, there is an option on the LSTM class to specify some dropout. The most commonly used value is 0.5, but I often use less.

Badges
Science topic

Feedback defines the constitution of an organism?

“Here is a thought experiment. Let's place Rodolpho Llinas's jarred-brain on top of a body (Fig. 1). I bet Llinas would argue that his jarred-brain retains its own consciousness, and the android...

11 August 2024 2,483 1 View

Text-Communication from the M1 Hand Area using BCI—and then there is Elon Musk?

Willett, Shenoy et al. (2021) have developed a brain computer interface (BCI) that used neural signal collected from the hand area of the motor cortex (area M1) of a paralyzed patient. The...

10 August 2024 7,180 0 View

Can we mark 'EFL Learners shifting from general digital to AI technologies' as technological transition?

After COVID-19 it has seen that EFL learners technological affiliation has raised. In addition, in the post-COVID period learners started to engage AI technologies like ChatGPT while learning...

08 August 2024 8,964 4 View

What are examples of AI for good projects a teacher can assign to students?

So I am organizing an AI seminar. What are possible AI projects in the AI for good spirit? something the students can do and have an impact?

08 August 2024 9,437 4 View

Self-Organizing Superorganisms—as envisaged by Nenad Sestan (2018)?

The rate of glucose consumption by the neocortex is reduced by over 80% during anesthesia (Sibson et al. 1998), which disables the synapses (Richards 2002) that are inundated by glial tissue (Engl...

08 August 2024 3,118 0 View

How to design human-centered classroom in the age of A.I.?

08 August 2024 347 5 View

What precautions should be taken while handling S. aureus enterotoxin Type B in the lab?

I would like to understand potential safety concerns while handling SEB in the lab. Especially while working in animal house facility. Would like to know precautions for handling. Sigma MSDS...

07 August 2024 6,034 3 View

The Bigger You Are, the Harder You Fall (some lessons from Dinosaurs)?

Evolutionary fitness is based on an organism’s ability to adapt rapidly to changing environmental circumstances. Large-bodied mammals have been equipped with large brains (and hence a high...

06 August 2024 4,849 2 View

Are air moisture harvesting technologies effective in combating desertification?

Air moisture harvesting Air water collection devices

06 August 2024 5,473 2 View

How to understand this crystallographic phenomenon of low temperature crystals in zeolite?

During low-temperature testing, new diffraction peaks that appear could be indicative of several phenomena. In one of our tests, we observed notable new peaks around 40° and 45° in a specific...

06 August 2024 726 3 View