I have parallelised my problem so that I can launch multiple kernels together and later on accumulate the results. I want to know how to do that. Currently, I do
Kernel1>()
Kernel2>()
Kerneln>()
In the profiler, it does not show that all the kernels were running in parallel.