Hi everyone,
I'm working on a project, "Multimodal Egocentric Action Recognition Based on Context Information," and I'm new to this research area.
My background is in Mechatronics and Control Engineering. Recently, I completed the Deep Learning Specialization courses, which gave me a basic understanding of deep learning concepts. However, I'm finding the concepts around sequence models a bit difficult.
My current goal is to develop a deep-learning algorithm to recognize actions using the Assembly101 dataset (https://assembly-101.github.io). One of my colleagues has already developed an action recognition algorithm using self-supervised learning for sequence data (A Neocortex‑Inspired Locally Recurrent Neural Network). I aim to extend his model to video data, specifically for the Assembly101 dataset. However, I'm unsure where to start and don't feel very confident about it.
I'm looking for guidance and am thrilled to connect with researchers who have experience building deep-learning models for video-based action recognition. I'd love to hear suggestions on adapting my colleague's self-supervised learning model for video data, particularly within the Assembly101 dataset.
Any help or advice would be greatly appreciated.
Thank you!