I am benchmarking my coarse-grained MD simulations with 30k beads. I noticed that MPI-GPU based mdruns used less memory and did not change much as I increased the number of nodes. Where as for MPI-OpenMP based mdruns showed increased memory usage with number of nodes.
Nodes MPI-OpenMP(GB) MPI-GPU (GB)
1 0.694 1.19
2 1.45 1.27
4 2.34 1.27
5 3.15 1.27
6 3.86 1.28
8 4.86 1.25
9 5.5 1.25
10 5.92 1.24
For all the benchmarks I have used 3 CPUs per task and 8 Tasks per node. For MPI-GPU simulations, 4 GPUs were used. I am trying to understand why MPI-GPU based mdruns are so memory efficient.
Thank you in advanced.