A simple approach could be to create the 64 dimensions from the three HSV channel histograms, e.g. 16 from the hue channel, 16 from the saturation channel, and 32 from the value channel. For each of the channels you have to calculate a normalized histogram with the corresponding resolution (i.e. 16 bins for the hue histogram, 16 for the saturation histogram, and 32 for the value histogram). The total of 64 histogram values constitute the visual vector that can be taken as a fingerprint of the image, e.g. for classification purposes.
you also can find some key points over the image which are invariant against variations. Later you describe Visual properties of those points. This method might work better for classification since you have mix of features: keypoints and visual properties.