Additional Information and Known Issues¶
More on running MPI jobs¶
MPI (Message Passing Interface) is a standard describing how to pass messages between programs running on the same or different machines.
MPI is a formal standard and it is actively supported by all major vendors. Some vendors have highly-optimized MPI libraries available on their systems. There are also a couple of open-source implementations of the MPI standard, such as MPICH and OpenMPI. There are also numerous commercial MPI implementations that support a wide range of systems and interconnects, for example, Platform-MPI and IntelMPI.
Support for a particular MPI implementation in ADF can be considered at three levels: the source code, the configure script, and pre-compiled binaries. At each level different MPI implementations may be supported.
The ADF source code is not implementation-specific and thus theoretically it supports any MPI library. Many popular MPI implementations are supported at the level of the configure script, but depending on your local setup you may need to make some modifications in the buildinfo file after running configure. For example on 64-bit Linux Intel MPI and Platform-MPI should work directly, but using OpenMPI will most likely require manual changes to correct the include and linker paths to the OpenMPI libraries of your system. The configure script will also try to generate an appropriate $ADFBIN/start script, but this might also need modification when using different MPI libraries. In general it is best to use the same MPI version used by SCM for the precompiled binaries.
When choosing an MPI implementation for pre-compiled binaries, SCM considers many factors including (but not limited to) the re-distribution policy, performance, and built-in support for modern interconnects. IntelMPI is currently the standard MPI implementation supported by SCM because it has the most favorable combination of these factors at this moment. For platforms where IntelMPI is supported its runtime is distributed with ADF (Windows, Linux). Platform-MPI builds are also available for linux, but should only be used in case of problems with IntelMPI. A different MPI implementation will be standard on a platform where Platform-MPI is not available. It may or may not be distributed with ADF. For example, SGI MPT is standard on SGI machines and OpenMPI is standard on Mac OS X platforms, but only the latter is distributed together with ADF.
When pre-compiled binaries do not work on your computer(s) due to incompatibility of the standard MPI library with your soft- and/or hardware, the SCM staff will be glad to assist you in compiling ADF with the MPI implementation supported on your machine(s).
If you are going to use an MPI version of the ADF package, and it is not IntelMPI, Platform-MPI or OpenMPI, you will need to determine if the corresponding MPI run-time environment is already installed on your machine. If not, you will need to install it separately from ADF. As it has been already mentioned, IntelMPI, Platform-MPI and OpenMPI are bundled with the corresponding version of ADF so you don’t need to worry about installing them separately.
Running with MPI on more than one node
When running on more than one machine (for example on a cluster without a batch system) you need to specify a list of hosts on which mpirun needs to spawn processes. In principle, this is implementation-specific and may be not required if the MPI is tightly integrated with your operating and/or batch system. For MPICH1 and Platform-MPI, you can do this by preparing a file containing hostnames of the nodes (one per line) you will use in your parallel job. Then you set the SCM_MACHINEFILE environment variable pointing to the file.
When you submit a parallel job to a batch system the job scheduler usually provides a list of nodes allocated to the job. The $ADFBIN/start shell script has some logic to extract this information from the batch system and pass it to the MPI’s launcher command (typically mpirun). In some cases, depending on your cluster configuration, this logic may fail. If this happens, you should examine the $ADFBIN/start file and edit the relevant portion of it. For example, you may need to add commands that process the batch system-provided nodelist or change mpirun’s command-line options or even replace the mpirun command altogether.
More about IBM Platform MPI¶
The use of Platform MPI is governed by our standard License Terms.
As it has been already mentioned, Platform MPI is currently one of the standard MPI implementations on some systems, namely on Intel-based Linux machines. We are using the Community Edition.
For more information about Platform MPI, including the User’s Guide please visit the MPI page at IBM. Platform MPI is distributed with the Platform-MPI versions of ADF and a complete Platform-MPI directory tree is found in $ADFBIN/platform_mpi. Using Platform-MPI with ADF does not require any additional fees. In addition to TCP/IP and shared memory supported by every MPI implementation, Platform-MPI also supports, without recompilation, all well-known interconnects such as Infiniband and Myrinet. The best available interconnect is chosen automatically at run time and this can also be modified using mpirun command-line switches in the $ADFBIN/start script.
A few words about the mpirun commands found in the start script. If you look inside the $ADFBIN/start file on Linux, you will likely see three mpirun commands. They differ in the way the list of nodes is specified, if specified at all. The mpirun command with a -lsb_hosts switch is used under the LSF batch system. In employs the Platform-MPI’s tight integration with LSF and lets it pick up the job configuration directly from the batch system.
The mpirun command with a -hostfile switch is used for multi-node interactive jobs and for all parallel jobs started under PBS or SGE batch systems. In the case of a batch system, the hostfile provided by the scheduler is passed to the mpirun command as-is, that is without modifications. The latter can be a problem if the format of the hostfile is different from that supported by Platform-MPI, in which case it should be modified accordingly. Please see the Platform-MPI User’s Guide for information about supported hostfile formats.
Finally, the simplest form of the mpirun command is used for single-node jobs when the calculation is performed on the localhost. In this case the NSCM environment variable determines how many parallel processes will be started. Please note that if “$NSCM” is not equal to “1” (digit one) then the exact value of the NSCM variable has no effect when the job is started under any of the batch systems or with a hostfile. In this case the number of parallel processes to start is determined by the contents of the hostfile.
Remote shell command under Platform-MPI
One important point about Platform-MPI is that it needs a remote shell command, such as ssh, to start tasks on compute nodes. By default ssh is used but this can be changed using the MPI_REMSH environment variable. For example, executing the following from the bash prompt will make mpirun use rsh instead of ssh:
export MPI_REMSH=rsh
As usual, if you want to make the change permanent, you need to add this command to a shell resource file.
There are cases when a Linux cluster is configured in such a way that both ssh and rsh communication from/to/between compute nodes is disabled. One of the most common examples is TORQUE with the MAUI scheduler. In this case, there is a remote shell replacement utility called pbsdsh. This utility checks that it is executed under PBS and, if yes, allows you to start remote programs only on the nodes allocated to that particular job. In principle, this is all that mpirun needs. The only problem is that pbsdsh uses different command-line options. To solve this, we provide in $ADFBIN a script called torque_ssh. To make Platform-MPI always use torque_ssh instead of ssh simply set MPI_REMSH in the shell resource file as follows (after ADFBIN has been defined):
export MPI_REMSH=$ADFBIN/torque_ssh
Known Platform-MPI issue: libibverbs error message
Sometimes Platform-MPI stops with the following (or similar) error message
libibverbs: Fatal: couldn't open sysfs class 'infiniband_verbs'.
This occurs, for example, on GigE-connected ROCKS clusters with OpenMPI installed. The error is caused by the fact that there are Infiniband libraries installed without corresponding kernel drivers and/or hardware. In this case, one has to enforce the use of TCP by Platform-MPI, which can be done using one of the methods below:
- edit the $ADFBIN/start script and add a -TCP option to all mpirun commands: :: $ADFBIN/platform_mpi/bin/mpirun.mpich -TCP ...
- or set the MPIRUN_OPTIONS environment variable in your shell resource file (for example, ~/.bash_profile): :: export MPIRUN_OPTIONS=-TCP
- or set the MPI_IC_ORDER environment variable in your shell resource file (for example, ~/.bash_profile): :: export MPI_IC_ORDER=”TCP”
Any of these options will make sure all other interconnects but TCP are ignored.
IntelMPI and SLURM¶
To get IntelMPI work under SLURM one needs to edit the $ADFBIN/start script and change the value of the I_MPI_PMI_LIBRARY environment variable to point to a correct libpmi library from SLURM.
Corrupted License File¶
You may find that, after having installed the license file, the program still does not run and prints a message “LICENSE CORRUPT”. There are a few possible causes. To explain how this error may come about, and how you overcome it, a few words on license files.
Each license file consists of pairs of lines. The first of each pair is text that states in a human-readable format a couple of typical aspects: A ‘feature’ that you are allowed to use (for instance ‘ADF’), the expiration date, a (maximum) release (version) number of the software and so on. The second line contains the same information in encrypted format: a long string of characters that appear to make little sense. The program reads the license file and checks, with its internal encrypting formulas, that the two lines match. If not, it stops and prints a “LICENSE CORRUPT” message.
So, there are two common reasons why this may happen:
You can use the fixlic utility to try to fix this automatically. Please be aware that the fixlic utility will try to fix the file pointed to by the $SCMLICENSE environment variable and replace it with the fixed copy. Thus, you need to make a backup of your license file first and you need to have write permissions for it.
cp $SCMLICENSE $SCMLICENSE.backup
$ADFBIN/fixlic
Windows: running jobs from the command line¶
In order to run ADF or any other program from the package without the GUI, navigate to the ADF installation directory and double click the adf_command_file.bat file. It will start a Windows command interpreter and set up the environment specific for that installation of ADF. Once it has started, cd to your jobs directory by entering the following commands at the prompt:
C:
cd \ADF_DATA
Then, run your job as follows (assuming the job is called h2o):
sh h2o.job
You can also prepare a job from a .adf file and run it using only two commands:
sh adfprep -t h2o.adf -j h2o > h2o.job
sh h2o.job
Please note that you do need to use sh in the commands above because both h2o.job and adfprep are shell scripts and, thus, they must be interpreted by a shell.