How to concatenate the visual feature vector and the textual feature vector to get the final feature vector?

Suppose I have a dataset of different objects (Image, and its label).

1. I read both the things separately.... (image+label),

2. Extracted visual and textual features using CNN

3. Now I want to concatenate both the features vectors into single vector for classification??

Similar questions and discussions