Hello Dear All,
Could you please tell me how the synchronization is done between Host and Device and between Device Kernels? I mean if I did not specify the streams parameters in the call of kernels what will be happen? And in my program main in Host part are the instructions done by order or I must use cudaDeviceSynchronize() to be sure that all Device work is completed?
Thank you very much