Hello Dear All, Could you please tell me when to use cudaDeviceSynchronize in our code? I read some of resource that when we use cudaDeviceSynchronize, it cause our program to slow down. Previously I use cudaDeviceSynchronyce after calling kernel to transfer data from device to host In other hand I use cudaMemcpyDeviceToHost to copy back data from device to host but I still find the different of performance both of them.So which one is better? Thank you very much