Hello, everyone

I have two NVIDIA VGA Cards, whose model names are identical.

When coding CUDA C++ for CNN Deep Learning,

if batchsize = 4, I can do parallel processing

so that each process runs two batches

using cudaSetDevice

But, for fully connected layers,

some codes need to do intra batch summation

In this case, how should I code to do parallel processing?

Thank you in advance and have a nice day

More Kyoungmun Chang's questions See All
Similar questions and discussions