i want to recognize human activities in multiple camera environment. I am taking 2 camera views for experiment. I want to fuse information extracted both views to get a precise feature vector. But i am facing following confusions:
As i am using supervised learning, I have to label activities of each person in each frame. In first camera view at some time t, two persons are observed as very close to each other so i label it as interacting. But in second camera view at same time t,it seems that those persons are not close to each other i.e distance is high.. so i labeled it as non- interacting.
How can i fuse two feature vectors (from 2 views) having 2 different labels at same time?