Quick Usage

Contents


1. GridMPI Overview

GridMPI

GridMPI extends a cluster MPI implementation YAMPII for communication in the Grid environment. GridMPI includes YAMPII, and thus can be used in clusters as well as in the Grid. The YAMPII protocol is used in intra-cluster communication, and the protocols for the Grid is used in inter-cluster communication. Currently, IMPI, Interoperable MPI, is supported for inter-cluster communication. GridMPI can use TCP/IP, platform supplied Vendor MPI, and the PM protocol in some cluster systems as an intra-cluster transport. GridMPI only uses TCP/IP as an inter-cluster transport.

IMPI Protocol

GridMPI follows the IMPI (Interoperable MPI) standard for inter-cluster communication. In the following explanation, some terminology from the IMPI standard is used.

MPI application with IMPI consists of multiple clients and one IMPI server.

A client is a one MPI job, which consists of some number of MPI processes normally started by mpirun. A client typically corresponds to a cluster. Each client is sequentially numbered from 0 to one-less the number of the clients. The IMPI standard limits the maximum number of clients to 32.

An IMPI server is a process to make contacts from clients. A server listens to a TCP/IP port and waits for connection from the clients. A server acts as an information exchange of the clients, who need to know the IP address/port pairs of the other clients. The clients make connections each other after taking information from the server.

An IMPI server does nothing after information exchange, but waits until all clients joining at MPI_Finalize. One server is needed for each run of an MPI application.

Invoking an IMPI server

An IMPI server is invoked by specifying the number of the clients (M) to use.

$ impi-server -server M

After invoking an IMPI server, it prints an IP address/port pair to stdout to which it is listening. The clients must specify this address/port pair at startup.

An IMPI server finishes at exit of an MPI application normally. Thus, an IMPI server needs to be invoked each time before an application is started.

Invoking IMPI clients

A client is started with a client number and the IMPI address/port pair.

$ mpirun -client K addr:port -np N ./a.out

K is from 0 to one-less the number of clients. Lower ranks of processes are assigned to ones which are invoked with lower client number K. addr:port is the address/port pair which IMPI server has printed out.

The total number of processes (NPROCS) in an MPI application is the sum of the N of all clients. The number of processes in a client is specified by mpirun with the argument -np N and the configuration file as normal invocations. The name of the configuration file of GridMPI/YAMPII is mpi_conf by default.

Process Structure of MPI Jobs

The figure below depicts the process structure of GridMPI when two processes are started in each of two clusters.

$ export IMPI_AUTH_NONE=0
$ impi-server -server 2 &
$ mpirun -client 0 addr:port -np 2 ./a.out &
$ mpirun -client 1 addr:port -np 2 ./a.out

mpirun -client K ... starts a single client. A number of clients are each started by mpirun.

Fig. Process Structure of a GridMPI Job
                        IMPI Protocol
          +---------+===================+---------+
          |         |                   |         |
    +-----|---------|-----+       +-----|---------|-----+ 	+--------+
    | +-------+ +-------+ |       | +-------+ +-------+ |	| IMPI	 |
    | | rank0 | | rank1 | |       | | rank2 | | rank3 | |	| Server |
    | +-------+ +-------+ |       | +-------+ +-------+ |	+--------+
    |     |         |     |       |     |         |     |
    |     +=========+     |       |     +=========+     |
    |   YAMPII Protocol   |       |   YAMPII Protocol   |
    +---------------------+       +---------------------+
       mpirun -client 0		     mpirun -client 1

Glossary

YAMPII:
is a cluster MPI implementation, on which GridMPI is based. GridMPI extends YAMPII for the Grid environment for inter-cluster communication via TCP/IP. YAMPII is an independent software product developed at Yutaka Ishikawa laboratory of the University of Tokyo, and is distributed under the LGPL license.
IMPI (Interoperable MPI):
is a standard to connect multiple MPI implementations, and is defined by NIST (National Institute of Standards and Technology) in January 2000. Refer the IMPI standard
IMPI Server:
is a server process defined in the IMPI specification to exchange information from invocations of MPI.
IMPI Client:
is an invocation of MPI, which connects to the IMPI server. IMPI terms it as client. A client normally corresponds to a cluster.
Vendor MPI:
is a platform supplied MPI from computer vendors. GridMPI can utilize a fast communication library provided for special hardware as an underlying communication layer through the use of Vendor MPI. The point-to-point communication layer uses MPI_Send/MPI_Recv for sending/receiving bytes in GridMPI.

2. Using GridMPI on PC Clusters

Follow the steps below:

(1) Set the environment variables.

$MPIROOT needs to be set to the installation directory. Commands, include files and libraries of GridMPI are installed under the directory $MPIROOT/bin, $MPIROOT/include, and $MPIROOT/lib.

Add $MPIROOT setting in .profile, .cshrc, etc, and add a path $MPIROOT/bin in the PATH. Assume /opt/gridmpi as MPIROOT in the examples below.

(For sh/bash)
$ MPIROOT=/opt/gridmpi; export MPIROOT
$ PATH="$MPIROOT/bin:$PATH"; export PATH

(For csh/tcsh)
% setenv MPIROOT /opt/gridmpi
% set path=($MPIROOT/bin $path)

When the cluster environment does not support rsh (remote-shell), it fails because MPI processes are started using rsh in a cluster by default. Set the environment variable _YAMPI_RSH to use ssh. For using ssh, no passphrase setting is needed, or ssh-agent shall be used. See [FAQ].

(For sh/bash)
$ _YAMPI_RSH="ssh -x"; export _YAMPI_RSH

(For csh/tcsh)
% setenv _YAMPI_RSH "ssh -x"

(2) Check the installation.

Check the contents of the directory.

$MPIROOT/bin:		mpirun, mpicc, mpif77, mpif90, ...
$MPIROOT/include:	mpi.h, mpif.h, mpi-1.h, mpi-2.h, mpic++.h
$MPIROOT/lib:		libmpi.a

Check the command paths.

$ which mpicc
$ which mpirun

(3) Compile the application.

$ mpicc mpiprog.c

The default compilers are set ones found at configuration time. They can be changed by the environment variables _YAMPI_CC, _YAMPI_CXX, _YAMPI_F77, and _YAMPI_F90.

(4) Create a configuration file.

mpirun reads a configuration from a file (mpi_conf in the current directory by default). mpi_conf holds a list of host names, one host in each line. It is an error, if the number of hosts is less than the number of processes specified by the -np argument to mpirun.

Contents of mpi_conf:

localhost
localhost
localhost
localhost

mpirun understands some of the extensions of MPICH configuration file, including a command argument for non-SPMD (Single Program Multiple Data) execution.

(5) Start a program in a single cluster.

$ mpirun -np 4 ./a.out

(6) Start a program in multiple clusters.

$ export IMPI_AUTH_NONE=0		...(*1)
$ impi-server -server 2 &		...(*2)
$ mpirun -client 0 addr:port -np 2 ./a.out &	...(*3)
$ mpirun -client 1 addr:port -np 2 ./a.out	...(*4)

(*1) Setting IMPI_AUTH_NONE specifies not to use any authentication. Both runs of impi-server and mpirun need the same setting.

(*2) Start the IMPI server. Run of the server prints an IP address/port pair to stdout. Pass it to mpirun in the next steps.

(*3, *4) Start MPI processes. Normally, two mpirun invocations are on different clusters.


3. Using GridMPI on IBM AIX (with LoadLeveler Interactive)

Follow the steps below:

(1) Set the environment variables.

$MPIROOT needs to be set to the installation directory. Commands, include files and libraries of GridMPI are installed under the directory $MPIROOT/bin, $MPIROOT/include, and $MPIROOT/lib.

Add $MPIROOT setting in .profile, .cshrc, etc, and add a path $MPIROOT/bin in the PATH. Assume /opt/gridmpi as MPIROOT in the examples below.

(For sh/bash)
$ MPIROOT=/opt/gridmpi; export MPIROOT
$ PATH="$MPIROOT/bin:$PATH"; export PATH

(For csh/tcsh)
% setenv MPIROOT /opt/gridmpi
% set path=($MPIROOT/bin $path)

Step (1) is similar to the step in Using GridMPI on PC Clusters.

(2) Check the installation.

Check the contents of the directory.

$MPIROOT/bin:		mpirun, mpicc, mpif77, mpif90, ...
$MPIROOT/include:	mpi.h, mpif.h, mpi-1.h, mpi-2.h, mpic++.h
$MPIROOT/lib:		libmpi.a, (or libmpi32.a or libmpi64.a)

Check the command paths.

$ which mpicc
$ which mpirun

Check xlc_r, xlC_r, xlf_r, and xlf90_r are in the PATH. Also check the directory /usr/lpp/ppe.poe exists.

(3) Compile the application.

$ mpicc mpiprog.c (32bit default, or configured without --with-binmode)
$ mpicc -q32 mpiprog.c (for 32bit)
$ mpicc -q64 mpiprog.c (for 64bit)

The default compilers are set ones found at configuration time. They can be changed by the environment variables _YAMPI_CC, _YAMPI_CXX, _YAMPI_F77, and _YAMPI_F90.

(4) Create configuration files.

Contents of host.list1:

node00
node00

Contents of host.list2:

node01
node01

Contents of llfile:

#@job_type=parallel
#@resources=ConsumableCpus(2)
#@queue

(5) Start a program in a single cluster.

$ mpirun -np 4 ./a.out -llfile llfile

(6) Start a program in multiple clusters.

The following runs two MPI jobs with two processes each.

$ export IMPI_AUTH_NONE=0		...(*1)
$ impi-server -server 2 &		...(*2)
$ mpirun -client 0 addr:port -np 2 -c host1.list ./a.out -llfile llfile & ...(*3)
$ mpirun -client 1 addr:port -np 2 -c host2.list ./a.out -llfile llfile	 ...(*4)

(*1) Setting IMPI_AUTH_NONE specifies not to use any authentication. Both runs of impi-server and mpirun need the same setting.

(*2) Start the IMPI server. Run of the server prints an IP address/port pair to stdout. Pass it to mpirun in the next step.

(*3, *4) Start MPI processes. Normally, two mpirun invocations are on different clusters.

NOTE: -llfile llfile is not necessary when LoadLeveler is not used.

mpirun calls the poe command of IBM-MPI internally, and the option -c of mpirun is renamed to -hostfile.


3. Notes on Using GridMPI on Hitachi SR11000 (IMPORTANT)

Notes on Using Hitachi f90

Hitachi f90 compiler is set to aggressive optimization -Os as the site default. Some programs fail due to its aggressive optimization.

Environment with Both 32bit and 64bit Binary Modes

Passing Options of IBM POE (Parallel Operating Environment)

GridMPI utilizes IBM-MPI as Vendor MPI, and mpirun calls the poe command of IBM-MPI internally. mpirun passes the arguments after a binary to the poe command intact, which are parsed and consumed by poe at its startup. The following example shows passing a -shared_memory option to poe.

$ mpirun -np N ./a.out -shared_memory yes

Some useful options of POE:

-shared_memory yes:
specifies to use shared memory in IBM-MPI for intra-node communication. It makes the communication much faster in most cases. The default is no.
-labelio yes:
specifies to append task IDs in printing stdout. It helps to tell a node which prints. The default is no.

5. Using GridMPI on Fujitsu Solaris/SPARC64V

Follow the steps below:

(1) Set the environment variables.

$MPIROOT needs to be set to the installation directory. Commands, include files and libraries of GridMPI are installed under the directory $MPIROOT/bin, $MPIROOT/include, and $MPIROOT/lib.

Add $MPIROOT setting in .profile, .cshrc, etc, and add a path $MPIROOT/bin in the PATH. Assume /opt/gridmpi as MPIROOT in the examples below.

(For sh/bash)
$ MPIROOT=/opt/gridmpi; export MPIROOT
$ PATH="$MPIROOT/bin:/opt/FSUNaprun/bin:$PATH"; export PATH

(For csh/tcsh)
% setenv MPIROOT /opt/gridmpi
% set path=($MPIROOT/bin /opt/FSUNaprun/bin $path)

(2) Check the installation.

Check the contents of the directory.

$MPIROOT/bin:		mpirun, mpicc, mpif77, mpif90, ...
$MPIROOT/include:	mpi.h, mpif.h, mpi-1.h, mpi-2.h, mpic++.h
$MPIROOT/lib:		libmpi.so, libmpi_frt.a, libmpi_gmpi.so
			(or libmpi32.so, libmpi_frt32.a, libmpi_gmpi32.so)
			(or libmpi64.so, libmpi_frt64.a, libmpi_gmpi64.so)

Check the command paths.

$ which mpicc
$ which mpirun

Check c99, FCC, frt, and f90 are in the PATH. Also check /opt/FJSVmpi2/bin/mpiexec exists.

(3) Compile the application.

$ mpicc mpiprog.c (32bit default, or configured without --with-binmode)
$ mpicc -q32 mpiprog.c (for 32bit)
$ mpicc -q64 -KV9 mpiprog.c (for 64bit)

The default compilers are set ones found at configuration time. They can be changed by the environment variables _YAMPI_CC, _YAMPI_CXX, _YAMPI_F77, and _YAMPI_F90.

(4) (Create configuration files). Configuration files are not needed with Fujitsu MPI -- global setting of the node is used.

(5) Start a program in a single cluster.

$ mpirun -np 4 ./a.out

(6) Start a program in multiple clusters.

The following runs two MPI jobs with two processes each.

$ export IMPI_AUTH_NONE=0		...(*1)
$ impi-server -server 2 &		...(*2)
$ mpirun -client 0 addr:port -np 2 ./a.out & ...(*3)
$ mpirun -client 1 addr:port -np 2 ./a.out	 ...(*4)

(*1) Set IMPI_AUTH_NONE to specify not to use any authentication. Both runs of impi-server and mpirun need the same setting.

(*2) Start the IMPI server. Run of the server prints an IP address/port pair to stdout. Pass it to mpirun in the next step.

(*3, *4) Start MPI processes. Normally, two mpirun invocations are on different clusters.

The GridMPI runtime calls mpiexec (/opt/FJSVmpi2/bin/mpiexec) in the Fujitsu MPI environment to start MPI processes. Options to mpirun are translated and passed to the Fujitsu runtime: -np to -n and -c to -nl.

mpirun converts a host-list file passed to the -c option to a node-list acceptable to the -nl option of Fujitsu mpiexec. The contents of a host-list file is matched against to the Fujitsu MPI configuration file, and a hostname is converted to a node number. It is performed by the makenodelist.fjmpi.sh script in the $MPIROOT/bin. Note that the format of a file specified by -c consists of one host per line (no comments allowed), which is different from the format of the configuration file for clusters.

mpirun also accepts the -nl option when configured with Fujitsu MPI, which is passed to mpiexec unmodified. For example, use a line like: -nl 0,0,0,0,0,0,0,0,0,0,...,0. Note that the number of nodes specified by the -nl option needs one more nodes than the value passed to the -np option.


6. Run with GRIDMPIRUN script

gridmpirun script is a simple frontend to start an impi-server and to start MPI processes via rsh/ssh. gridmpirun starts impi-server in local host, and then calls mpirun using rsh or ssh as specified by the configuration file (impi_conf by default).

The configuration file of gridmpirun can be specified by the -machinefile option.

(1) Create a gridmpirun configuration file.

Contents of impi_conf configuration file:

-np 2 -c host.list1
-np 2 -c host.list2

Contents of llfile:

#@job_type=parallel
#@resources=ConsumableCpus(2)
#@queue

(2) Start an MPI application.

$ gridmpirun -np 4 -machinefile impi_conf ./a.out -llfile llfile

($Date: 2006/06/07 15:56:42 $)