23 November 2021 0 9K Report

I have to combine both audio and video data into a variable. For this purpose I am using LSTM, but dimension of my data is 5-D and LSTM accepts only 3-D input. I am using the following paper:

https://wlv.openrepository.com/bitstream/handle/2436/622981/IF2019.pdf?sequence=2

I have also used convLSTM instead of LSTM, but every time google colab memory crashed.

Audio Features: 508, 10,300, 353,1

Video feature: 508, 10, 300, 353, 1

More Tariq Sm's questions See All
Similar questions and discussions