I 'm trying to design an algorithm for computing the depth map of two stereo images in real time. I intend to do both hardware and software implementation on CPU(ODROID XU4) and hardware implementation on zedboard. The thing here is that I have a parallelized hardware architecture for the FPGA implementation . My question is it good to base the software implementation on the hardware one when trying to achieve the best level in parallelism , in simple words converting the blocks of the hardware architecture into processes in the software implementation ?