There are lots of algorithms for object identification suggested by several researchers, but the performance of the algorithms depend on the quality of image or video. As image is a function of intensity, therefore luminance level during image acquisition,camera specification and several other factors will determine the quality of image as well as video.
Thus algorithms like SURF, SHIFT, and many more such algorithms may or may not render desired object identification.
It's quality of image as well as video depicts the performance of the algorithms.
I strongly recommend you to use deep features since they are the current state-of-the-art in ImageNet challenge. By using Python (keras framework), you can easily extract these features from VGG16, ResNet and the most recent architectures.