Can somebody explain the process behind obtaining ground truth values from a motion capture device, which can be used for comparison with our proposed system like for eg: Kinect? tell me whether we have to do our work in the presence of both the 3D motion capture (ground truth) and our proposed system(Kinect)