We are accelerating some matlab codes. We use a mix between gpuArrays and CUDA to achieve this. Does anyone have best practices document for that?
Does the gpu array include any clue to manage the number of blocks in a grid, the number of threads/block, shared memory, pinned memory, etc?