I am working in global hand gesture recognition. I would like to align/ normalize the scaling, translation and rotation effect. Can anyone suggest me some techniques to do so.
You can choose frequency or statistical descriptor to achieve in variance effect. You can refer following papers where you can get all methods at a glance.
Zhang Dengsheng , Lu Cuojun , “Review of Shape Representation and Description Techniques, ” The Journal of The Pattern Recognition Society, Elsevier , pp. 1-19, 2004 Yang Mingqiang, K. Kidiyo, R. Joshep, “A Survey of Shape Extraction Technique,” Pattern Recognition, Peng-Yeng Yin (Ed.), pp. 43-90, 2008.
For the recognition process you can use neural networks. But as I understand you are talking about a previous step to collect data. It will depend on what data you have.
Gestures might take various forms. Here I assume that you are taking about either trajectories (a sequence of coordinates received from moving hand centroid, the dynamic gesture as you said) or a posture (fingerspelling sign). Whenever you want rotation invariance, a reference vector is required about which you can start your description or feature extraction.
When we are talking of a trajectory, this reference vector is readily available because every digit/alphabet/symbol has mostly uniform stroking order. Thus, the vector that joins centroid to the starting point of the trajectory is the intended reference vector.
However, when we work with postures, we don't have that common reference vector. In this case, some standard reference vectors can be tried, e.g., principal axis (eigenvectors computed from point cloud or contour), axis of least inertia, or maximum radius (the line that connects centroid to the farthest point). I have found best results with principal axis.
Now, once you have established a reference vector, you can use unit radius circle (using maximum radius line) which can give you scale invariance. And, when you subtract the mean from the point cloud, structure gets shifted to the origin which brings translation invariance.
Please see the article for clarification:
Kane, Lalit, and Pritee Khanna. "A framework for live and cross platform fingerspelling recognition using modified shape matrix variants on depth silhouettes." Computer Vision and Image Understanding 141 (2015): 138-151.
Note that complex features like Zernike or Pseudo-Zernike moments also normalize the shape using unit radius circle while consuming a lot of time for rotation invariant description. Same story holds for spectral descriptors like Fourier and its variants too. I guess the above stated mechanism suits for real-time implementations. Article suggested by Archana is good one:
Zhang, D., & Lu, G. (2004). Review of shape representation and description techniques. Pattern recognition, 37(1), 1-19.