It could be a good research question for a high-level research. Their structures are different and you need to understand them very well before taking any step on it.
well, you are trying to maintain the higher accuracy and reduce complexity. Faster RCNN is using VGG as a backbone model and is a two-stage object detector i.e. RPN and VGG while YOLO(Darknet backbone) is a single shot detector utilizing anchor boxes. one possible combination is to try using Darknet in faster RCNN replacing VGG and vice versa for YOLO. the other possible option is to remove RPN and utilize anchor boxes similar to YOLO in faster RCNN.
But again, it depends on your problem. what kind of dataset you want to use. like YOLO results are extremely bad on the self-driving cars dataset.