It all depends on your machine and the time/accuracy performance you want to achieve. Deep features are great, but you need GPU if you want a real time performance system. If you want a very light and fast tracking, you can use color or contour features instead.
I agree with Celine and Yasser's answer. For your consideration, deep features will give you good performance when you have a lot of training data. The color and gray feature, usually, have less generalisation capability. If you use HOG, your feature will be better able to overcome the problem of rotation and scaling. If you want to use Deep Feature, i think you can use a pretrained model than make your own model. It will make your work faster.
tracking is the ability to find good matching between different parts of subsequent images extracted from a video sequence. The possibility of obtaining good match for rigid objects is mainly due to the presence of contours and (due the aperture problem) a good motion estimation is obtainable in the direction perpendicular to the contour locally.
Patterns on objects allow good matching (in a constant light field) even on homogeneous surfaces. In the presence of illumination variations the object tracking can be a difficult task.
There is a huge literature about motion estimation.