I regret to disagree with you. It is common to treat normalized pictures as vectors using pixel values as coordinates for classification. Have a look at the concept of Eigenfaces (see link attached). You will see that the principal components of the problem are directly interpretable as pictures precisely because pixel values are used in classification.
Of course, this does not prevent you from using any dimensionality reduction technique that you see fit: histograms, as Hossein suggests, autoencoders, PCA, whatever.
It directly depends on your application(segmentation, recognition,...). Generally, it is not usual to use pixel value for training or testing the SVM.
Pixel value and the the coordination should be considered simultaneously or you can use pixel value to calculate the histogram of a template or image then use the resulted histogram as a feature for given image to train or test SVM.
I regret to disagree with you. It is common to treat normalized pictures as vectors using pixel values as coordinates for classification. Have a look at the concept of Eigenfaces (see link attached). You will see that the principal components of the problem are directly interpretable as pictures precisely because pixel values are used in classification.
Of course, this does not prevent you from using any dimensionality reduction technique that you see fit: histograms, as Hossein suggests, autoencoders, PCA, whatever.
To my mind this question heavily depends on the overall classification approach / goal (as Hossein Soleimani already pointed out) - and what exactly we are looking for. It basically depends on the question what is an 'instance' regarding the result of the classifier and the set of individual classes. Is it possible to describe the different classes on pixel-level, or is more complex structural respectively syntactic information needed?
For example if you want to apply segmentation to an image, we can comprehend this as a pixel-wise 2-class-classification, either the pixel belongs to the background or it belongs to an object. In this case it is most likely feasible to use pixel values to train and test the classifier.
Another example might be block-wise evaluation of image objects, where you use the single pixel results for majority voting with regards to the overall block. Lets say you have blocks of 10x10 pixels and you do a pixel-wise classification. This leads to 100 decisions for each block. If for example, one would apply a simple majority voting using all these 100 results, it leads to an overall fused decision for the complete block.
On the other hand it might not be very feasible, if you have to consider more complex structural or syntactic information. For example if the goal is to find different shapes or areas with certain directional information and so forth.
Reiterating what is being said : why not ?? pixel values are spatio-frequency transformation of the object(s). They do have enough info ,rather than taking only RGB you can transform to YCbCr / HSV ...representation or neighborhood(old fashioned way )/ segmentation soon.
You look at classification because you want to get specific information (searching ) and probably optimize while doing so hence the different ways of performing the grouping !! PCA is one such technique with Eigen values being independent and orthogonal ...
Yes it can be, because your dataset which is here your pixels values will be divide as you know into training and testing subset and they will swap their role in each iteration, which it mean that the training subset will become testing subset and vice versa.