For instance, if I have some Desktops, each one with a NVidia GPU, how could I do to divide a task between each node and, in each one divide between the CUDA cores?
Are there any useful technique able to do that? Any suggestion of papers would be appreciated.