We can mix Matlab GPUArray code with CUDA through Kernel object. Is there a benefit/cost to have c-mex layer in between.
I think name mangling and converting from Column-major order matrices in Matlab to Row major order matrices in c-mex will be costly.
What is the best combination Matlab GPU, CUDA, C-MEX?