What are the main used metrics for evaluate gestures (on-device or in-air) based interactions and interfaces? Any papers, examples, books, or any references on this topic will be much appreciated.
I can say a bit about mid-air gestures. They are generally evaluated in terms of performance (time, error rate) for the particular task. Another aspect generally evaluated is the fatigue they induce (gorilla arm). Researchers generally evaluate fatigue in a qualitative way, where the NASA-TLX is the most common tool. However, the Borg scales (RPE or CR10) might be more appropriate for measuring fatigue alone as they do not include all the other (confounding?) factors such as mental load or frustration. Other researchers have started to look at more objetive measurements for fatigue. There was a WiP at CHI this year which starts to look into this aspect (http://dl.acm.org/citation.cfm?id=2468356.2468406). I am personally working on the topic from a different perspective and will be submitting for CHI 2014.
Our paper on metrics for arm fatigue was accepted for publication at CHI '14. You can find more details here: http://hci.cs.umanitoba.ca/projects-and-research/details/ce