26 May 2021 0 7K Report

I have a dataset 12 videos. Each video is comprised of 179 frames. On these frames, I have applied ResNet-50 to extract features, and I received (179,7,7,2048) features. As far I know,

179=Total number of frames

2048=total number of features generated from a frame

7*7=kernel size / Filter size

Now I have to train my model using convLSTM by passing the features extracted through ResNet-50. And I know that input shape for convLSTM is

batch_shape + (channels, conv_dim1, conv_dim2, conv_dim3)

OR

batch_shape + (conv_dim1, conv_dim2, conv_dim3, channels)

So what should be the input shape for convLSTM and how can I apply the output of ResNet-50 to the convLSTM?

Regard

More Tariq Sm's questions See All
Similar questions and discussions