I'm programming on a simulation software that provides both a parallelization with either MPI or OpenMP. The goal is to have a hybrid implementation that uses MPI for communication between physical compute nodes in our cluster, but OpenMP on the same machine. Explicitly setting the correct number of computers and parameters works with our software.

The problem is the following: I can specifically request nodes on our cluster with a certain number of processors per node. This will give me multiple entries per node in the nodefile/machinefile. For certain reasons I sometimes want to start several MPI processes on the same node and use only fewer OpenMP threads per MPI process. Hence, I cannot filter the nodefile to contain each node only once.

Currently, I am setting the number of OpenMP threads first and then start an MPI process for every entry in the nodefile. Only some of the MPI processes continue with the computation and other are put to sleep. I am not entirely happy with my solution, though.

I am using Intel's MPI library (version 3.2). Terminating MPI processes not necessary for computation kills all MPI processes because communication does not work. Using an MPI barrier is not an option since it is a busy wait and so the processor resources are not freed for other OpenMP threads. My current solution is for the MPI processes not taking part in the computation to sleep for one minute and then look for an MPI message if they should terminate. Does anyone have a better idea? Is there a way to put processes to (real) sleep and wake them up based on an MPI message?

More Simon Schröder's questions See All
Similar questions and discussions