Overview of IMPI Relay: Private Address Support for GridMPI

0. Glossary

The following terms are used in this document (and others), which are based on the terms defined in the IMPI specification.

Client
An instance of a cluster in a single MPI application.
Host
An entity in a client which has an IP address. It normally corresponds to a node in a cluster.
Process
An MPI process, in a host. It is uniquely identified by a process ID in a host. Host and Process are identical entity in the GridMPI-2.0 implementation.
IMPI Server
A server used to exchange IP address/port and other information among the hosts in the clients. Clients contact it and exchange information through it at start-up of an MPI application.
Agent
An entity in a client, which contacts IMPI server. A process in a client with the smallest rank acts as an agent in the implementation of GridMPI-2.0.

1. Overview

IMPI Relay is a forwarding mechanism transparently by IMPI protocol, to bridge nodes in a cluster private address to global address. Relay be run on a node with both a private address and a global address. It forwards only data in the IMPI protocol, and it is not a general mechanism such as NAT (Network Address Translation).

Its work is transparent to the IMPI clients when using a single Relay, and any implementation of the IMPI protocol should work with Relay. Sharing traffic by multiple Relay needs distributing traffic and it makes discrepancy to the IMPI protocol and needs slight modifications to the IMPI implementation.

2. Processing flow

Without IMPI Relay, one process is paired with one host (one-to-one mapping). On the other hand, IMPI Relay acts as a host, and all processes in the cluster are dangling it (one-to-many mapping).

IMPI Relay does two kind of works. The first one is relaying IMPI command messages between the IMPI Server and the Agent. The second one is relaying data messages between hosts in each cluster. In the first half of the initialization phase, a client which has the smallest rank number becomes a Agent of each cluster. The Agent communicates the IMPI Server to gather and distribute the client information, as shown in Figure 1 and 2. In the second half of the initialization phase, each host establishes all-to-all connections, and rank numbers are renumbered, as shown in Figure 3 and 4. And then, MPI messages are sent and received over those connections.

2.1. Initialization (first half)

IMPI Relay forwards communications between the IMPI Server and the Agent in the private address cluster.


Figure 1. Initialization (w/o IMPI relay)

Figure 2. Initialization (w/ IMPI relay)

2.2. Initialization (second half)

IMPI Relay forwards communications between the private address cluster and the external cluster.


Figure 3. After Initialization (w/o IMPI relay)

Figure 4. After Initialization (w/ IMPI relay)

3. Info: Structure of IMPI Relay Execution

See Structure of GridMPI Execution on a normal GridMPI execution.

The IMPI Relay is a proxy process of the IMPI protocol which is used for inter-cluster communication. MPI processes on client1 communicate with IMPI server and MPI processes on client0 via IMPI Relay. In this case, IMPI Relay provides a view as one host have two processes (rank2, rank3) to client0; it also provides a view as one host have two processes (rank0, rank1) to the client1.

                        IMPI Protocol
          +---------+========================+
          |         |                        |
    +-----|---------|-----+       +----------|----------+	+--------+
    |     |         |     |       |      +-------+      |	| impi-	 |
    |     |         |     |       |      | relay |      |	| server |
    |     |         |     |       |      +-------+      |	+--------+
    |     |         |     |       |      /       \      |
    | +-------+ +-------+ |       | +-------+ +-------+ |
    | | rank0 | | rank1 | |       | | rank2 | | rank3 | |
    | +-------+ +-------+ |       | +-------+ +-------+ |
    |     |         |     |       |     |         |     |
    |     +=========+     |       |     +=========+     |
    |   YAMPI Protocol    |       |   YAMPI Protocol    |
    +---------------------+       +---------------------+
       mpirun -client 0		     mpirun -client 1

4. Usage

You can find the usage at "How to use IMPI Relay".