If it works, great! Just try to check if it generalizes, which is testing your network with new data it has not seen before at all.
Also, maybe check if the execution times of your algorithms for each image. This could be another thing you can use to see how good your network is doing
If the HoG feature extraction is able to get only the most significant features of the dataset you are working on, probably it will curtail the time required for the CNN to produce final results. But again you need to observe the overall execution time. If it is less and you are getting better accuracy, in that case this approach would be appreciated.
So you can train your CNN on the image dataset and then compare it with SVM+HOG. You can use HOG as the input of CNN, but it is not common to do so. In most cases, it under-performs pure CNN. Note that CNN does not equal R-CNN.