in my viewpoint a 2 layers model should achieve a good results, for instance, use RNN for object classification and Fuzzy C-means for improving accuracy.
My pragmatic suggestion would be to find the most recent "State Of The Art" article review on your particular topic (image classification) and you will find there what type of algorithms performs best on your problem!