09 September 2019 1 463 Report

Dear everybody!

I do a hobby project as creating a character-level seq2seq2 LSTM.

In my task, I give a text as an input (max 40 characters) and the LSTM generates an output that rhymes with the input.

I created very large rhyming rows databases.

At the beginnings I trained my model with the next parameters:

batch_size = 200 epochs = 250 latent_dim = 300 num_samples = 10000

with these parameters my model converged to 0.4 after 75 epoch, but i waited all the 250 epoch and tested that model.

The result wan't so bad, but I wanted more.

After that I tried very large batch sizes, with more than 200k training data (almost all possible parameres) and every result leads to overfitting, that means my model threw the same sentence to every input. BUT(!) after I tried the 250 epoch model, I used checkpoint saving and tested only the best model after it didn't converge more. It stops at 0.29 acc usually.

I know the character level lstm in this task has its own limitations, but it would be really 10k training data?

Is it possible the convergence doesn't matter in this case and the model needs only more epochs?

Is the database too big and has a lot of stopwords and I need to do word-frequency-based filtering on the training data?

I know that the word-level method could be more effective, but I'm afraid of I misunderstood something and I don't want to waste more time to wait results from training until I don't know what I'm doing wrong.

What should I do?

Thank you all.

Similar questions and discussions