I have noticed that someone can find a lot of work on Human activity recognition, but just a few ones focus on human activity detection problem (also referred in literature as activity localization or action spotting). This renders human activity recognition useless for real-life applications, as most videos are unsegmented and cannot be annotated as global entities that contain just one action. Do you have any suggestions - ideas concerning how this problem might be solved?