25 February 2019 3 9K Report

The inference speed of transformer-xl is faster than transformer.

Why?

If state reuse is the reason, so it is compared by 2 32seq_len + state-reuse vs 1 64seq_len + no-state-reuse?

More Tong Guo's questions See All
Similar questions and discussions