Skip navigation
NASA Logo, National Aeronautics and Space Administration
Modeling Guru Banner
Currently Being Moderated

Using PoDS to run many independent serial tasks on Discover

VERSION 17  Click to view document history
Created on: Aug 5, 2008 12:11 PM by Rahman Syed - Last Modified:  Sep 24, 2015 9:27 PM by bvanaart

What is PoDS?

 

Portable Distributed Scripts (PoDS) is a set of scripts created by SIVO/ASTG that enables users to execute a series of independent (serial or parallel) tasks on the Discover system.  Many users have the need to run large sets of data processing tasks on Discover which has been difficult to coordinate due to the node architecture of the system.  With PoDS, we intend to make the process simple and flexible.

 

How to Use PoDS

 

The following instructions are also viewable on Discover, in /usr/local/other/pods/README.  If you require assistance, feel free to comment on this page or contact Jules.Kouatchou.

 

Setting up passwordless SSH

 

Prior to running PoDS, you must be able to ssh between nodes in a batch session without having to enter your password.  For more information on how to do this, please read  Password-less logins with ssh-keygen

 

Creating an execution file

 

The user must first create an "execution file".  This is a simple text file where each line contains a complete command that the user would like to execute.  The line should contain the commands along with any parameters that are required to execute the command (e.g. input files, switches, etc).

 

The user should be able to execute each command from the directory where the execution file is located.  That is to say, PoDS first moves into the directory where the execution file is located; it then proceeds to execute all commands from that location.  To avoid confusion, it may be desirable to use absolute paths whenever possible.

 

Invoking PoDS

 

The user is responsible for writing a SLURM script to request resources from the system.  In the script, PoDS can be invoked as follows:

 

/usr/local/other/pods/pods.sh /path/to/execfile cpus_per_node

 

 

IMPORTANT NOTE:  The execfile argument to pods.sh must be an absolute path to a valid execution file.  PoDS uses the absolute path as a working directory where it executes the user's commands, and creates trigger files to monitor status.

 

Example

 

Here's an example execution file and invocation for 12 tasks on 8 cpus:

 

Execution file (running the same script on 12 different data files):

Located in /home/someuser/myparalleljob/execfile1

 

 

./process_data.sh -i file1 -o outfile1
./process_data.sh -i file2 -o outfile2
./process_data.sh -i file3 -o outfile3
./process_data.sh -i file4 -o outfile4
./process_data.sh -i file5 -o outfile5
./process_data.sh -i file6 -o outfile6
./process_data.sh -i file7 -o outfile7
./process_data.sh -i file8 -o outfile8
./process_data.sh -i file9 -o outfile9
./process_data.sh -i file10 -o outfile10
./process_data.sh -i file11 -o outfile11
./process_data.sh -i file12 -o outfile12

 

 

SLURM script for 8cpus (submit with "qsub sbatch_script.sh")

Located in /home/someuser/myparalleljob/sbatch_script.sh

 

 

#!/bin/bash
#SBATCH --account=<your_account#>
#SBATCH --time=8:00:00
#SBATCH --ntasks=8 --ntasks-per-node=4
#SBATCH --job-name=process_data

cd /home/<username>/myParallelJob
/usr/local/other/pods/pods.sh /home/<username>/myparalleljob/execfile1 4 exit 0

 

Remarks

  • It is important to note that PoDS does not make any assumption on the underlying applications. Tasks can be from different applications.
  • PoDS assumes that all the tasks listed in the execution file are serial and independent. If a user has some subtasks that depend on each others, he/she should create a parent task (a script file handling all the dependencies. It will then be listed in the execution file) that contains all the subtasks.
  • There are times when individual nodes could not access the environment variables (for instance the module loaded) set by the users. We recommend that users set them in each their execution commands (within a script).

 

 

Python Version of PoDS

 

A new version of PoDS was rewritten entirely in Python by Jules Kouatchou. It works exactly as the C-shell one but contains new features:

  • The usage is provided from the command line (see below).
  • Absolute path no more necessary for the execution file.
  • Dynamic assessment of resource availability. If the number of CPUs is not supplied by the user, by defaults PoDs automatically uses all the available cores on each reserved node.
  • PBS job identifier is used to differentiate submissions.
  • Each task is timed.
  • A summary report is provided: number of task run, number of nodes (together with the number of cores per nodes) used, elapsed time for executing all the jobs.

 

Usage of the Python Version

 

To get the information on how to use PoDS, type one of the following:

/usr/local/other/pods/pods.py
/usr/local/other/pods/pods.py -h

 

These two commands are acceptable:

/usr/local/other/pods/pods.py execfile1 4
/usr/local/other/pods/pods.py /home/someuser/myparalleljob/execfile1 4

 

It is now possible to only type:

 

/usr/local/other/pods/pods.py execfile1

 

Do not forget that PoDS can only be used within a SLURM script or in an interactive session.

 

Performance Analysis

To show the efficiency of PoDS, we wrote a simple application that randomly generates a integer n between 0 and 10^9. It then loops over n to perform some basic operations. Any time the application is called, a different value of n is obtained. We create an execution file that contained 130 (no specific meaning) calls of the application. The timing results are summarized in  the table below.

 

Nodes

Cores/Node

Time (s)

1

1

651


2

326


4

167


8

85

2

1

327


2

166


4

86


8

46

 

The results clearly show that significant gains when more than one core is employed in the case of one node request. Even with two nodes, PoDS continues to perform well.

Comments (1)
USAGov logo NASA Logo - nasa.gov