If I understand your question correctly, I can say that applying deep learning network is not the only way to do such process. You could utilize methods that, e.g., calculates color correlogram from an image. A color correlogram encodes the spatial correlation of colors in an image. Or you can apply methods that extract MPEG7 color layout features from images. After doing so, run your desired classification algorithm to accomplish your task.