We are accelerating some matlab codes. We use a mix between gpuArrays and CUDA to achieve this. Does anyone have best practices document for that?

Does the gpu array include any clue to manage the number of blocks in a grid, the number of threads/block, shared memory, pinned memory, etc?

More Ahmed Hassan Yousef's questions See All
Similar questions and discussions