How to combine shallow with deep visual descriptors?

Hi Konstantinos,

Fischer et al. showed that CNNs outperform local descriptors based on orientation histograms such as SIFT, HOG, SURF. Follow:

Fischer et al.,"Descriptor Matching with Convolutional Neural Networks: a Comparison to SIFT", 2014 - https://www.researchgate.net/profile/Alexey_Dosovitskiy/publication/262568634_Descriptor_Matching_with_Convolutional_Neural_Networks_a_Comparison_to_SIFT/links/541fe8dc0cf2218008d41617.pdf

This was confirmed by state-of-art GoogLeNet. Follow:

Szeged et al., "Going Deeper with Convolutions", 2014 - https://arxiv.org/abs/1409.4842

Neverthless, Benenson et al. highlighted "...although some of these features might be driven by learning, they are mainly hand-crafted via trial and error,..." and concluded about deep architectures: "...Despite the common narrative there is still no clear evidence that deep networks are good at learning features for pedestrian detection (when using pedestrian detection training data). Most successful methods use such architectures to model higher level aspects of parts, occlusions, and context. The obtained results are on par with DPM and decision forest approaches, making the advantage of using such involved architectures yet unclear...".

Follow:

Benenson et al., "Ten Years of Pedestrian Detection, What Have We Learned?", 2014 - https://www.researchgate.net/profile/Rodrigo_Benenson/publication/268452043_Ten_Years_of_Pedestrian_Detection_What_Have_We_Learned/links/54f4897a0cf2eed5d734d3ee.pdf

As a compromise, combining deep learning and local descriptors allow to enhance computational performances. Follow:

Lipetski, et al., "A combined HOG and deep convolution network cascade for pedestrian detection", 2017 - http://www.ingentaconnect.com/contentone/ist/ei/2017/00002017/00000004/art00003?crawler=true&mimetype=application/pdf
Lipetski, et al., "Close to real-time robust pedestrian detection and tracking" - 2015 http://booksc.org/book/51048339/446fa6
Alom et al., "Robust Multi-view Pedestrian Tracking Using Neural Networks", 2017 - https://arxiv.org/abs/1704.06370

Finally, Milan et al. proposed very recently a recurrent neural network to address online multi-target tracking. They casted "...the classical Bayesian state estimation, data association as well as track initiation and termination tasks as a recurrent neural net, allowing for full end-to-end learning of the model...".

Follow:

Milan et al., "Online Multi-Target Tracking Using Recurrent Neural Networks", 2017 - https://www.researchgate.net/profile/Anton_Milan/publication/301876852_Online_Multi-target_Tracking_using_Recurrent_Neural_Networks/links/573b367f08ae9f741b2d7ba2/Online-Multi-target-Tracking-using-Recurrent-Neural-Networks.pdf

Regards

Article Descriptor Matching with Convolutional Neural Networks: a Co...

Conference Paper Ten Years of Pedestrian Detection, What Have We Learned?

Article Online Multi-target Tracking using Recurrent Neural Networks

Face exression using RGB-D data

Which is the best tracking algorithm available ?

What's a good visualization tool for a presentation?

Which Bag-of-Words technique is the most appropriate for action/object recognition?

Which is the most appropriate journal for computer vision publications?

Do you believe that skin detection could help human action recognition in smart home environments?

Does anyone has worked on human action detection?

Feedback defines the constitution of an organism?

Self-Organizing Superorganisms—as envisaged by Nenad Sestan (2018)?

Measuring the Intelligence of a Species?

How can i do multivariate Time Series forecast using MLP, ANFIS and LSTM?

The Curse of Evolution and Complexity?

Need help with my research project on open source SIEM and machine learning?

Swimming/space travel depends on the proprioceptive muscle spindles?

What are the limitations and challenges of using machine learning for predicting concrete compressive strength in practical applications?

Some new emerging problems on application of RL for scheduling in IoT networks?

How to Compress Information Neurally?