Hello, everyone
I have two NVIDIA VGA Cards, whose model names are identical.
When coding CUDA C++ for CNN Deep Learning,
if batchsize = 4, I can do parallel processing
so that each process runs two batches
using cudaSetDevice
But, for fully connected layers,
some codes need to do intra batch summation
In this case, how should I code to do parallel processing?
Thank you in advance and have a nice day