My task is to generate keywords from sentences.
I pretrain a text-generation model. I mask the sentences' tokens and predict the whole sentences' tokens.
Pretraining batch_size = 8 and step = 1000000
I haven't observed improvement from pretraining. BLEU score is 10.5 for not pretraining, BLEU score is 9.5 for pretraining.
Code
I take the python code from
https://github.com/google-research/pegasus/blob/master/pegasus/models/transformer.py#L38
hidden_size = 512 num_encoder_layers = 3 num_decoder_layers = 3
Discussion
The task is to generate keyword from sentences. The keyword may not appear in the sentences. So input masked sentences to predict whole sentences, it is not benefit the keywords generation task. Input masked sentences to predict whole sentences, it do not have relation to the keywords generation task. Am I right? Is it the reason that pretraining do not improve the BLEU score?
Thank you very much.