I think this link can help you "https://github.com/vra/action-recognition-using-3d-resnet". It provides extracted features from UCF101 and HMDB51 datasets.
As a final note, I didn't understand fully your question, do you need extracted features or a pipeline to extract features from videos (using CNNs)?