Video dataset should contain a single person with moving face gestures so that the trained model does not depend on a single pose. I want to train facenet VGG face retrained models on this dataset using triplet loss for transfer learning .
20BN-SOMETHING-SOMETHING. The 20BN-SOMETHING-SOMETHING dataset is a large collection of densly-labeled video clips that show humans performing predefined basic actions with every day objects. Human activities