MPI Commands gridmpirun (1MPI)

NAME

gridmpirun - start MPI processes across clusters

SYNOPSIS

gridmpirun [-np nprocs] [-allcpu] [-machinefile configuration-file] [-rsh remote-shell] [-ckptdir directory] [-restart] [-nameport ip-addr:port] [-nounix] [-h] [program...]

DESCRIPTION

gridmpirun is a simple frontend to ease starting multiple jobs across clusters. It reads a configuration file and first starts impi-server, and then invokes mpirun jobs at specified clusters. impi-server is a contact point of multiple MPI jobs. A configuration file is specified either (1) by a -machinefile option, (2) by the environment variable IMPI_CONF, or (3) the default $MPIROOT/etc/config, which are preferred in this order.

The program may be specified in the configuration file, and it is optional. The program and its arguments are place at the last.

OPTIONS

The following options are supported:

-np nprocs
Specifies the number of MPI processes to start. The configuration file should include the same or more process specifications than the nprocs argument.

-allcpu
Specifies to use all processes in the configuration file. This forces to ignore the -np option.

-machinefile configuration-file
Specifies a configuration file. The description of a line in a configuration file is described below. The configuration file defaults to $MPIROOT/etc/config.

-rsh remote-shell
Specifies a remote shell command to invoke mpirun remotely. It is possibly an rsh or ssh in most cases.

-ckptdir directory
Specifies a directory to place a checkpoint save file.

-restart
Specifies to restart from a saved checkpoint.

-nameport ip-addr:port
Specifies a name server for the MPI-2 DPM (Dynamic Process Management) by the IP address and port.

-nounix
Specifies to use TCP between locally allocated processes.

-h
Prints a usage.

CONFIGURATION FILE SYNOPSIS

gridmpirun starts one MPI job for each line in the configuration file. The configuration file consists of a line with the following options:

-np nprocs [-rsh remote-shell] [-rh host-name] [-wd directory] [-lmpirun mpirun-command] [-c configuration-file] [-ckptdir directory] [program...]

CONFIGURATION FILE OPTIONS

-np nprocs
Specifies the upper limit of the number of MPI processes for a cluster. gridmpirun uses this number from the top until summing up to a speicifed number by the command line argument nproces. This argument is mandatory.

-rsh remote-shell
Specifies a remote shell passed to the mpirun command.

-rh host-name
Specifies a host to start MPI processes.

-wd directory
Specifies a current working directory on a specified host.

-lmpirun mpirun-command
Specifies a command to start MPI processes with. Each line of a configuration file invokes one mpirun command on a specified host.

-c configuration-file
Specifies a configuration file passed to the mpirun command.

-ckptdir directory
Specifies a directory in which to place a checkpoint save file.

program...
Specifies a program and its arguments. This is optional.

The other elements on a line are passed verbatim as command line arguments. Note that the program cannot take arguments same as the above options, because they are consumed by gridmpirun.

ENVIRONMENT VARIABLES

MPIROOT
Specifies the GridMPI installation directory. It is used to search for the default configuration file.

IMPI_CONF
Specifies a default configuration file. It corresponds to the command line option -machinefile.

IMPI_INTER_RSH
Specifies a remote shell command to start mpirun remotely. It corresponds to the command line option -rsh.

EXIT STATUS

gridmpirun returns 0 always, except for the error case occurred inside gridmpirun.

EXAMPLE

The following example shows to start five processes in two clusters, two for SiteA and three for SiteB. SiteA cluster consists of two hosts host_a0 and host_a1; and SiteB cluster consists of three hosts host_b0, host_b1, and host_b2.

Assume the configuration file $MPIROOT/etc/config contains:

-rh host_a0 -c $MPIROOT/etc/mpi_conf1 -np 2 /home/edamoto/a.out -k
-rh host_b0 -c $MPIROOT/etc/mpi_conf2 -np 3 /home/edamoto/b.out -n

Also assume at SiteA the configuration file $MPIROOT/etc/mpi_confA contains:

host_a0
host_a1

Also assume at SiteB the configuration file $MPIROOT/etc/mpi_confB contains:

host_b0
host_b1
host_b2

Invoking gridmpirun -np 5 first starts an IMPI server process at an invoking site, and then starts five MPI processes across two sites as: two on SiteA, and three on SiteB. The IP address and port number of an IMPI server process is passed to the both MPI processes started at two sites.

Each line in the configuration file is considered simply as a command line replaced -rh by rsh or ssh, but whose -np option is modified to meet the specified sum of the processes. The process count is taken from the top: the next specification line is used when the sum is not match to the specified number.

Specifying larger number of processes than the sum of the -np options in the configuration file is an error.

The number of processes that can be invoked in this example are summarised as:

-np:12345≥6
SiteA:12222error
SiteB:00123error

SEE ALSO

mpirun

AUTHOR

Masahiko Edamoto


($Date: 2006/06/07 15:56:42 $)