I have parallelised my problem so that I can launch multiple kernels together and later on accumulate the results. I want to know how to do that. Currently, I do

Kernel1>()

Kernel2>()

Kerneln>()

In the profiler, it does not show that all the kernels were running in parallel.

Similar questions and discussions