MPI Commands | gridmpirun (1MPI) |
gridmpirun - start MPI processes across clusters
gridmpirun [-np nprocs] [-allcpu] [-machinefile configuration-file] [-rsh remote-shell] [-ckptdir directory] [-restart] [-nameport ip-addr:port] [-nounix] [-h] [program...]
gridmpirun is a simple frontend to ease starting multiple jobs across clusters. It reads a configuration file and first starts impi-server, and then invokes mpirun jobs at specified clusters. impi-server is a contact point of multiple MPI jobs. A configuration file is specified either (1) by a -machinefile option, (2) by the environment variable IMPI_CONF, or (3) the default $MPIROOT/etc/config, which are preferred in this order.
The program may be specified in the configuration file, and it is optional. The program and its arguments are place at the last.
The following options are supported:
gridmpirun starts one MPI job for each line in the configuration file. The configuration file consists of a line with the following options:
-np nprocs [-rsh remote-shell] [-rh host-name] [-wd directory] [-lmpirun mpirun-command] [-c configuration-file] [-ckptdir directory] [program...]
The other elements on a line are passed verbatim as command line arguments. Note that the program cannot take arguments same as the above options, because they are consumed by gridmpirun.
gridmpirun returns 0 always, except for the error case occurred inside gridmpirun.
The following example shows to start five processes in two clusters, two for SiteA and three for SiteB. SiteA cluster consists of two hosts host_a0 and host_a1; and SiteB cluster consists of three hosts host_b0, host_b1, and host_b2.
Assume the configuration file $MPIROOT/etc/config contains:
-rh host_a0 -c $MPIROOT/etc/mpi_conf1 -np 2 /home/edamoto/a.out -k -rh host_b0 -c $MPIROOT/etc/mpi_conf2 -np 3 /home/edamoto/b.out -n
Also assume at SiteA the configuration file $MPIROOT/etc/mpi_confA contains:
host_a0 host_a1
Also assume at SiteB the configuration file $MPIROOT/etc/mpi_confB contains:
host_b0 host_b1 host_b2
Invoking gridmpirun -np 5 first starts an IMPI server process at an invoking site, and then starts five MPI processes across two sites as: two on SiteA, and three on SiteB. The IP address and port number of an IMPI server process is passed to the both MPI processes started at two sites.
Each line in the configuration file is considered simply as a command line replaced -rh by rsh or ssh, but whose -np option is modified to meet the specified sum of the processes. The process count is taken from the top: the next specification line is used when the sum is not match to the specified number.
Specifying larger number of processes than the sum of the -np options in the configuration file is an error.
The number of processes that can be invoked in this example are summarised as:
-np: 1 2 3 4 5 ≥6 SiteA: 1 2 2 2 2 error SiteB: 0 0 1 2 3 error
Masahiko Edamoto