Dealing with Missing Values in Recurrent Neural Network (RNN) with masking/padding?

25 November 2022 4 8K Report

Hi there, I'm so appreciated you pay attentions to this question.

Nowadays, it is quite common that we consider masking or padding when dealing with the data including missing values on sequential data on RNN (or perhaps on image data on CNN).

However, I have not found the paper to propose/establish the exact method.

The reason why I'm looking for it is that I would like to learn what exact happens in RNN to LSTM layer when it is masked as 'Skip the input' .

If you know of any, please let me know.

Thanks from Japan.

---I would make a few additions---

When reading about masking technique, we can often see the descriptions like “ignore the missing value” or “skip the input” with masking. On the other hand, there are few references in the literature that explain in mathematical formulas the skipping of input missing values.

Then, I wonder what exactly happens inside the RNN layer when input is masked.

As you can see in the attached image of the formula, RNN state z^t in timestep t can be expressed with the input x^t, recurrent input z^{t-1}, and the wights W.

And, if the input is masked to be ignored, how is z^t calculated?

Is z^t calculated with imputing x^t (like, same values in previous timestep is inputted) ?

Or, is z^t not calculated and exported as NaN value?

I'm sorry but I'm not looking for an imputation method but the mechanism inside RNN or LSTM when masked to ignore the input.

Again, thank you from Japan.

Tim Albiges

This paper [1] is quite interesting. Depending on the distribution type of missing values and use of imputation methods can either decrease model performance or increase bias. Masking helps by skip that input rather than including a imputed value that can have negative affects on model performance. I relate the masking may have similar effects as dropout layers that helps to disentangle connected features [2] which improves models performances.

[1] https://www.mdpi.com/2076-3417/9/8/1623

[2] https://jmlr.org/papers/v15/srivastava14a.html

Mehmet Karahan

The paper below may help you.

Article Combining Attention with Spectrum to Handle Missing Values o...

Jeongwhan Choi

Hi, Junichiro Niimi

The interpolation methods can handle the missing values.

Linear or cubic spline interpolation can fill some proper values in the missing values.

If you are familiar with differential equations, read my work, e.g., STG-NCDE[1].

STG-NCDE has two Neural CDEs(Controlled Differential Equations)[2], which are for irregular time-series forecasting.

[1]Choi, Jeongwhan, et al. "Graph Neural Controlled Differential Equations for Traffic Forecasting." Proceedings of the AAAI Conference on Artificial Intelligence 36(6) (2022): 6367-6374. https://doi.org/10.1609/aaai.v36i6.20587

[2]Kidger, Patrick, et al. "Neural controlled differential equations for irregular time series." Advances in Neural Information Processing Systems 33 (2020): 6696-6707.

Junichiro Niimi

Thank you all for responding to my question.

I would make a few additions:

Then, I wonder what exactly happens inside the RNN layer when input is masked.

As you can see in the attached image of the formula, RNN state z^t in timestep t can be expressed with the input x^t, recurrent input z^{t-1}, and the wights W.

And, if the input is masked to be ignored, how is z^t calculated?

Is z^t calculated with imputing x^t (like, same values in previous timestep is inputted) ?

Or, is z^t not calculated and exported as NaN value?

I'm sorry but I'm not looking for an imputation method but the mechanism inside RNN or LSTM when masked to ignore the input.

Again, thank you from Japan.

---

Adding again. I finally found the exact answer. According to Cui et al. (2020), if the input for timestep t is missing value, the output of timestep t also equals to a missing value. This means that although the it is said that RNN is dealing with missing values by masking them, it is just not actually outputting them.

Cui, Z., Ke, R., Pu, Z., & Wang, Y. (2020). Stacked bidirectional and unidirectional LSTM recurrent neural network for forecasting network-wide traffic state with missing values. Transportation Research Part C: Emerging Technologies, 118: 102674.

Any papers dealing with Feature Fusion in Neural Network?

Any studies applying Cox Proportional Hazards Model to Customer Churn Prediction?

I need reviewers for my article in cureus.

Feedback defines the constitution of an organism?

How to learn more about SPSS and its Application?

Can I base on reverse DNA sequences to perform alignment, convert to amino acids and GenBank submission?

Baseline drift in HPLC? What causes this?

Text-Communication from the M1 Hand Area using BCI—and then there is Elon Musk?

Handling Missing Data and Building a Predictive Model with Incomplete Information ?

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

How can I use the cif data obtained from rietveld refinement extracted via gsas2, for microstructural analysis using ETEX software?

Self-Organizing Superorganisms—as envisaged by Nenad Sestan (2018)?

How are iso-frequency contours plotted?