I have a video dataset, extracted all its frames, and applied ResNet-50 to extract features from all frames. ResNet-50 provides feature map of (2534, 7, 7, 2048), 2534 are the number of frames.
Now I have to apply convLSTM to train the model, but what should be its input shape.
Regards