I was reading your paper on "Human Interaction Prediction Using Deep Temporal Features 2016" ( Conference Paper Human Interaction Prediction Using Deep Temporal Features
) and was trying to implement it on Keras and Tensorflow. I tried to follow every bit of your paper but got stuck with the part of temporal convolution. I have extracted feature vectors of color-coded frames of UT_Segment_SET2 using a pre-trained CNN_M_2048 model and save them in a JSON file. As there are different no of frames for each video. I have padded the feature vector using Keras preprocessing toolkit. Now I have passed these vectors to a 1D CNN followed by two FC layers. I have followed each and every bit of paper but unable to obtain similar results. Can anybody please help me or connect with me on this? I think i am not understanding the architecture correctly.