I'm working on the video feature extraction based on deep learning and have found the short-term correlation feature is very important for downstream tasks. So, I would like to ask relevant people which networks can well extract the short-term correlation feature in videos?