As per my study, it depends on your dataset, most time when you have less dataset it would be enough to use two to three layers, but if you have more dataset, You can cheek the VGG8, VGG16, and VGG19. I hope it helps you.
Faster RCNN generates a series of proposals through the Region Proposal Network (RPN), and then sends the picture and the proposals together to the FastR-CNN.
The model consists of two major modules:
1) Region Proposal Network
RPN is a full convolutional network. The network consists of a convolutional layer, an intermediate layer, a classification layer, and a regression layer. The convolutional layer is consistent with Fast R-CNN
2) Fast R-CNN
After the proposals are obtained by the RPN output, it is regarded as the input of another Fast R-CNN. The RoI pooling layer uses the proposal window to extract the proposal feature from the feature map and send it to the subsequent full
connection and softmax network for classification.
The Faster R-CNN has the same number of hidden layers as the Fast R-CNN, the RPN has no hidden layers and is only used as a feature extractor. The Fast R-CNN has three fully connected layers.