Hi there, I'm so appreciated you pay attentions to this question.

Nowadays, it is quite common that we consider masking or padding when dealing with the data including missing values on sequential data on RNN (or perhaps on image data on CNN).

However, I have not found the paper to propose/establish the exact method.

The reason why I'm looking for it is that I would like to learn what exact happens in RNN to LSTM layer when it is masked as 'Skip the input' .

If you know of any, please let me know.

Thanks from Japan.

---I would make a few additions---

When reading about masking technique, we can often see the descriptions like “ignore the missing value” or “skip the input” with masking. On the other hand, there are few references in the literature that explain in mathematical formulas the skipping of input missing values.

Then, I wonder what exactly happens inside the RNN layer when input is masked.

As you can see in the attached image of the formula, RNN state z^t in timestep t can be expressed with the input x^t, recurrent input z^{t-1}, and the wights W.

And, if the input is masked to be ignored, how is z^t calculated?

Is z^t calculated with imputing x^t (like, same values in previous timestep is inputted) ?

Or, is z^t not calculated and exported as NaN value?

I'm sorry but I'm not looking for an imputation method but the mechanism inside RNN or LSTM when masked to ignore the input.

Again, thank you from Japan.

Similar questions and discussions