Before CNNs started to dominate, Support Vector Machines (SVMs) were the state-of-the-art. So it seems sensible to say that an SVM is still a stronger classifier than a two-layer fully-connected neural network
Consider an AlexNet or VGG type architecture in which you have multiple convolution layers followed by multiple fully connected layers. One line of thinking is that the convolution layers extract features. These features are used by the fully connected layers to solve an image classification task. Hence, the output of the final convolution layer is a representation of our original input image. You can definitely use this representation as input for an SVM in a classification problem although a few fully connected layers will likely perform better.
Suppose you obtain pretrained weights for a deep CNN. For any input image, you can generate representations by computing to the final convolution layer, then utilizing these representations as inputs to your SVM. This would be pretty quick and assuming that you’re not straying too far from the original image classification task, would likely perform decently.
Consider an AlexNet or VGG type architecture in which you have multiple convolution layers followed by multiple fully connected layers. One line of thinking is that the convolution layers extract features. These features are used by the fully connected layers to solve an image classification task. Hence, the output of the final convolution layer is a representation of our original input image. You can definitely use this representation as input for an SVM in a classification problem although a few fully connected layers will likely perform better.
Suppose you obtain pretrained weights for a deep CNN. For any input image, you can generate representations by computing to the final convolution layer, then utilizing these representations as inputs to your SVM. This would be pretty quick and assuming that you’re not straying too far from the original image classification task, would likely perform decently.
The CNN performs feature extraction, and the input to any fully connected layer could just as well be used as the input to a SVM classifier.
Another option to be considered is end-to-end training of a CNN with a hinge-loss objective and squared-L2 regularization (a la SVM). You lose some fundamental properties of the SVM convex problem, but the approximate primal solution should perform just as well. It's pretty much impossible to get a sample with undefined gradient (i.e. lying exactly on the margin) when using SGD.