30 December 2024 1 4K Report

For background, I am trying to run two independent and separate .conf files each on a different GPU. It is NAMD molecular dynamics simulations I am running.

I login correctly and everything and specific this to request: qrsh -l gpu,A100,cuda=2,h_rt=12:00:00. I am running via remote server to a HPC cluster.

When I go to run my files: CUDA_VISIBLE_DEVICES= 0 ~/bin/namd3 +p1 +setcpuaffinity +devices 0 config.conf > log.log or CUDA_VISIBLE_DEVICES= 1 ~/bin/namd3 +p1 +setcpuaffinity +devices 0 config.conf > log.log. (+p1 was done cause its GPUresident in NAMD. not important to the issue at hand however).

However, this slows down both simulations very much. Normally, it takes 1 hour to run a .conf file on a single A100 GPU card. But when I run both at the same time, even though they are on different A100 GPUs it slows down immensely and say it will take 4.5 hours to finish each one.

I am hoping to find a solution to this because it should finish in an 1 hour total because each sim is one different GPUs and they aren't interfering with one another. This was a GPU usage graph. The CUDA_VISIBLE_DEVICES doesn't fix didn't work. Below is my GPU usage during the two sims running on separate GPUs.

More Aarav Singh's questions See All
Similar questions and discussions