What is the best way for motion feature extraction form video frames?

If you are interested to track an object (e.g., human) in a video than removes noise from the video frames, segments the frames using frame difference and binary conversion techniques and finally tracks the object (e.g., human) using a bounding box based on occurrence of high intensity values horizontally and vertically.

http://ieeexplore.ieee.org/document/7366165/?reload=true

Sergio Blanco-Trejo

Do you mean how to perform tracking of objects? If so, I'd use a cross-correlation based algorithm.

Morteza Heidari

Hi, take a look at these papers, all proposed ways are easy to understand and implementation, and so helpful. You should think about what you need and what is better for your purpose, then Based on your necessity, You can select the best one and use it or even upgrade it.

http://www.cv-foundation.org/openaccess/content_cvpr_2014/html/Kantorov_Efficient_Feature_Extraction_2014_CVPR_paper.html

https://link.springer.com/article/10.1007%2Fs10462-012-9319-1?LI=true

http://ieeexplore.ieee.org/abstract/document/1234141/?part=1

http://ieeexplore.ieee.org/abstract/document/5711891/

http://i.cs.hku.hk/~cwu/papers/Wu-spie02.pdf

Dang N. H. Thanh

There are many methods to do this job. It's very hard to talk about the best method. However, I think the motion tracking by optical flow is very good approach!

Amin Ullah

Dear Sergio

I want to analyze motion feature for human action in video.

K. K. Gupta

You can extract motion vector by segmentation. The method should be background invariant to extract foreground motion. For that you use background models.

Luis Tobias

There is a very good paper about action recognition called Quo Vadis and the Kinetics dataset. I think it could be of your interest:

https://arxiv.org/abs/1705.07750

Also, you can have a look to C3D by Facebook and Two stream convolution.

Cheers,

Jeffrey Jenkins

Hi Amin,

I think a description of your dataset is needed before really deciding on the strategy you take.

Questions I have regarding your dataset are,

Who are you studying?

-what are they doing (walking pedestrians, sleeping infants, etc.)

-do you have a validation dataset (to use as training data or to create templates from)

What sort of activities do you want to recognize?

-What is your sampling rate?

>certain activities are very quick (microexpressions, etc.) and hard to capture with consumer devices

How were the images acquired?

-Is it color (r,g,b) or grayscale (ir, etc.)

-do you have 3D information?

>can the device produce a depth image, etc

When do the results matter?

-designing a real time method vs a more 'machine learning' approach have different tradeoffs to satisfy time constraints

Where was the video taken?

-is it a static location (surveillance camera) or moving (driverless car, etc.)

Why do you want to recognize activities?

-is it an app for monitoring fitness (jogging vs. walking, etc.)

-do you want to make a 'recommender system' (e.g.threat assessment)?

Thanks,

Jeff

Amin Ullah

Thanks Jaffery for asking such a good questions.

Actually i am working on recognizing human action in movies data.

i have download Hollywood2 action videos dataset with 30 frames per second.

Jeffrey Jenkins

Hi Amin,

The first thing I would do is try to reproduce the results from the authors. Usually you can reach out to the author(s) and ask for the code used for the paper. This is a good starting point because they may provide their analysis framework. However, you may need to set up a more robust framework which allows you to compare results between your methods and the one the authors used.

After implementing their method, there are some exercises to consider doing.

First, what are the videos that failed classification? If you remove these images from the dataset into a separate dataset, rejected_data, you should get 100% accuracy on the remaining data.

What qualitative observations can you make that characterize the quantitative performance difference in their approach? Some of the observations you can make (and construct new features) : day/night, male/female, # people, #actions, fast/slow, cluttered/sparse, verbal/non-verbal. I could imagine adding these additional 'labels' to each video descriptor file in order to boost the accuracy results by using machine learning tools to explore correlation between your new features.

Here are some examples of questions I would ask to try and understand better the reason for performance differences in their results:

- Is background illumination a factor in performance?

- Does the number of people in the scene affect accuracy?

- How do non-moving objects affect performance?

- Does gender/skin tone/clothing/hair/etc. affect performance?

etc...

I can provide motivating questions if necessary but this should be a good starting point. Also, it might be a bit ambitious for one to attempt and define a general purpose strategy for recognition. Many great classification systems are not composed of a single classifier model, but rather an ensemble.

Thanks,

Jeff

Which Deep Learning platform is Convenient for developers to deploy trained model in to products?

What are the limitations of Hand Crafted Visual Features for Action Recognition?

Looking for Video Retrieval Dataset?

Can python code be use in C# using dll or other method?

What is the current trend or future trend in Augmented reality and Virtual reality?

How can I re-scale data with same distribution?

What is difference between Action Recognition and Activity Recognition?

Feedback defines the constitution of an organism?

How to learn more about SPSS and its Application?

Is there a problem with my RNA pellet?

Can I base on reverse DNA sequences to perform alignment, convert to amino acids and GenBank submission?

Baseline drift in HPLC? What causes this?

Text-Communication from the M1 Hand Area using BCI—and then there is Elon Musk?

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

How can I use the cif data obtained from rietveld refinement extracted via gsas2, for microstructural analysis using ETEX software?

Can we mark 'EFL Learners shifting from general digital to AI technologies' as technological transition?

What are examples of AI for good projects a teacher can assign to students?