Probaby a best way to approach the problem is to study the features of the object you intend to track. Namely, is it a person in a crowd, then the head and shoulders could be useful features to differentiate from surrounding objects (i.e, trees, vehicles if video acquired in an out-door environment), or shape (morphology) of the object if you intend to detect unusual behavior (e.g. people tend to keep the position of their arms downwards while walking). Then you can use any classifier, whether it was machine-learning based or deep-learning based, if you have already identified a discriptive and good quality features to work with. This will make the job of the classifier much easier.
If you use Matlab, a good example to start with for object tracking can be found on the following link: https://ch.mathworks.com/help/vision/examples/motion-based-multiple-object-tracking.html
you can use color, motion as well as depth/disparity information from camera(s)
there are both parametric (kalman filter particle filter, feature tracking using SFM and SIFT/SURF) and non parametric approaches (meanshift/camshift) with opencv and matlab code.
First you need to extract your object with SfM or optical flow ...
Then if you want you want scale invariant descriptor you can use Sift/Surf or Hog you can also have a look to bag of features or even deep learning (by transfer learning) then I would recommend SvM or AdaBoost to make your classification ... But the right answer will depend on your training dataset...