09 September 2014 0 713 Report

Since, there's a latency gap between the host (CPU) - Device (GPU) - Main Memory (RAM), sometimes the performance of GPGPU computing is effected.

How to solve this throughput problem of data input/output on GPGPU computing other than optimising the data transfer (as shown in the provided link) on GPU??

Can anyone provide some reference to good academic papers (conference or journal) on this problem / solution / topic?

http://devblogs.nvidia.com/parallelforall/how-optimize-data-transfers-cuda-fortran/

Similar questions and discussions