4.3. Submitting advanced jobs

1. Submitting an array of jobs

2. Submitting Parallel jobs

- Open MPI


1. Submitting an array of jobs

Sometimes you'll need to submit a lot of jobs in a single script. In this case, perhaps you'll need to handle thousands of independent jobs and its own input, output and error files. As a general rule, it's not a good idea generate thousands of separate job scripts which are submitted to the cluster.

Sun Grid Engine allows users to submit a special kind of job which executes a single scripts with N different input files. This is called 'array job' and it's submitted to the cluster only once and can be managed by as a single job. 

To create an array job is slightly different than create a single job, as long as we have to manage the N different input jobs. In this example, we'll simply copying the contents of the N different input files to the output files:
 
First, we create the generic files to place the input data. As a general rule, the easiest way to name the input files is to append an integer at the end of the input file, and use the environment variable SGE_TASK_ID to handle them. Whit the following 'for' command we create ten input, empty files:
for i in {1..10}; do echo "File $i" > data-$i.log; done

Let's verify the input files:

uhpc@login:~/Source/array-example$ ls -l
total 0
-rw-r--r-- 1 uhpc info_users 0 Dec 18 17:23 data-10.log
-rw-r--r-- 1 uhpc info_users 0 Dec 18 17:23 data-1.log
-rw-r--r-- 1 uhpc info_users 0 Dec 18 17:23 data-2.log
-rw-r--r-- 1 uhpc info_users 0 Dec 18 17:23 data-3.log
-rw-r--r-- 1 uhpc info_users 0 Dec 18 17:23 data-4.log
-rw-r--r-- 1 uhpc info_users 0 Dec 18 17:23 data-5.log
-rw-r--r-- 1 uhpc info_users 0 Dec 18 17:23 data-6.log
-rw-r--r-- 1 uhpc info_users 0 Dec 18 17:23 data-7.log
-rw-r--r-- 1 uhpc info_users 0 Dec 18 17:23 data-8.log
-rw-r--r-- 1 uhpc info_users 0 Dec 18 17:23 data-9.log
When creating an array job, -t flag will handle the way the scheduler will take the input files:

Fig 1. Basic parameters for array job

Flag  Request Comment
-N <job_name> Job Name Definition
The name of the job. The name should follow the "name" definition in sge_types(1).  Invalid job names will  be  denied  at submit time.
-t  n[-m[:s]] Array index limits
The option arguments n, m and s will be  available  through  the  environment  variables              SGE_TASK_FIRST, SGE_TASK_LAST and  SGE_TASK_STEPSIZE.

In the example, we want to pick ten files, starting on 1, ending at 10, and pick them one after another. So, out -t option has to be -t 1-10:1 (start on one, end on 10, increment 1 each time):  

#!/bin/bash
#$ -N FileArray
#$ -t 1-10:1
FILESPATH=/homedtic/uhpc/Source/array-example
cat $FILESPATH/data-${SGE_TASK_ID}.log > $FILESPATH/output-${SGE_TASK_ID}.log
exit 0;

Submit array job using qsub command:

uhpc@login:~/Source/array-example$ qsub submit-array.sh
Your job-array 153660.1-10:1 ("FileArray") has been submitted
Monitor job status using qstat command, 10 jobs running on short.q queue. You can see duplicated 10 times the job-ID 153672 on different nodes in the cluster. This time, the qstat commands shows additional output at the task-ID column:
uhpc@login:~/Source/array-example$ qstat
job-ID  prior   name       user         state submit/start at     queue          slots ja-task-ID
-------------------------------------------------------------------
153672 0.50000 FileArray  uhpc         r     12/18/2013 18:30:38 short.q@node10                 1 1
153672 0.50000 FileArray  uhpc         r     12/18/2013 18:30:38 short.q@node05                 1 2
153672 0.50000 FileArray  uhpc         r     12/18/2013 18:30:38 short.q@node09                 1 3
153672 0.50000 FileArray  uhpc         r     12/18/2013 18:30:38 short.q@node03                 1 4
153672 0.50000 FileArray  uhpc         t     12/18/2013 18:30:38 short.q@node10                 1 5
153672 0.50000 FileArray  uhpc         t     12/18/2013 18:30:38 short.q@node05                 1 6
153672 0.50000 FileArray  uhpc         t     12/18/2013 18:30:38 short.q@node09                 1 7
153672 0.50000 FileArray  uhpc         t     12/18/2013 18:30:38 short.q@node03                 1 8
153672 0.50000 FileArray  uhpc         r     12/18/2013 18:30:38 short.q@node04                 1 9
153672 0.50000 FileArray  uhpc         t     12/18/2013 18:30:38 short.q@node10                1 10
Output files are stored where script file have output-* defined.
uhpc@login:~/Source/array-example$ ls -l output-*
-rw-r--r-- 1 uhpc info_users 8 Dec 18 18:30 output-10.log
-rw-r--r-- 1 uhpc info_users 7 Dec 18 18:30 output-1.log
-rw-r--r-- 1 uhpc info_users 7 Dec 18 18:30 output-2.log
-rw-r--r-- 1 uhpc info_users 7 Dec 18 18:30 output-3.log
-rw-r--r-- 1 uhpc info_users 7 Dec 18 18:30 output-4.log
-rw-r--r-- 1 uhpc info_users 7 Dec 18 18:30 output-5.log
-rw-r--r-- 1 uhpc info_users 7 Dec 18 18:30 output-6.log
-rw-r--r-- 1 uhpc info_users 7 Dec 18 18:30 output-7.log
-rw-r--r-- 1 uhpc info_users 7 Dec 18 18:30 output-8.log
-rw-r--r-- 1 uhpc info_users 7 Dec 18 18:30 output-9.log
The content of every file is the same than his parent input file, as we have specified in our script:
uhpc@login:~/Source/array-example$ cat output-1.log
File 1
uhpc@login:~/Source/array-example$ cat data-1.log
File 1
uhpc@login:~/Source/array-example$ cat output-1.log
File 1
uhpc@login:~/Source/array-example$ cat data-10.log
File 10
uhpc@login:~/Source/array-example$ cat output-10.log
File 10
To delete all jobs of an array, execute command qdel specifying the job-ID of the whole array job. It will mark all ten independent jobs for deletion:
uhpc@login:~/Source/array-example$ qdel 153677
uhpc has registered the job-array task 153677.1 for deletion
uhpc has registered the job-array task 153677.2 for deletion
uhpc has registered the job-array task 153677.3 for deletion
uhpc has registered the job-array task 153677.4 for deletion
uhpc has registered the job-array task 153677.5 for deletion
uhpc has registered the job-array task 153677.6 for deletion
uhpc has registered the job-array task 153677.7 for deletion
uhpc has registered the job-array task 153677.8 for deletion
uhpc has registered the job-array task 153677.9 for deletion
uhpc has registered the job-array task 153677.10 for deletion
To delete a single job of an array, specify the TASK_ID at the ending of the job-ID. The syntax should follow this way: qdel array-job-id <dot> element_job_id: 
uhpc@login:~/Source/array-example$ qdel 153679.1
uhpc has registered the job-array task 153679.1 for deletion

 

2. Submitting Parallel jobs
 
 
Open MPI divides resources in cluster slots, the next example show as you should execute an OMP binary in a parallel environment with Open MPI (mpiexec). The next example  show how you can submit a job with NPB performance tools.
 
Write an script for qsub command (see basic example to build it):
#!/bin/bash
#$ -pe ompi* 256
#$ -N bt.B
export OMP_NUM_THREADS=256
/soft/openmpi/openmpi-1.6.5/bin/mpiexec  /PATH/TO/BIN/NPB3.2/NPB3.2-OMP/bin/bt.B

Fig 2. Parameters Open MPI Script

Flag  Request Comment
-N <job_name> Job Name Definition The name of the job. The name should follow the "name" definition in sge_types(1).  Invalid job names will  be  denied  at submit time.
-pe ompi* <slots> Parallel enviroment 

slots = set integer number if you submit an MPI job that requests a lot of slots (ex. more than 16).

You can ask to SGE how many slots are available by using a range descriptor

#$ -pe ompi* 8,16 (schedule when the system can provide either 16 or 8 slots)
#$ -pe ompi* 4-16 (requests anywhere from 4 to 16 slots on a single host)

 
In this example bt.B is running with 256 slots on node02:
$ qsub pe-qsub.sh
Your job 153804 ("bt.B") has been submitted

You can see the number of slots using qstat command:

qstat
job-ID  prior   name       user         state submit/start at     queue          slots ja-task-ID
--------------------------------------------------------------------
153804 0.52500 bt.B       uhpc      r     12/18/2013 20:16:40 default.q@node02              256       

Node02 has 256 processes running  for bt.B program.

PID USER      PR  NI  VIRT  RES  SHR S  %CPU %MEM    TIME+ COMMAND
41020 uhpc   20   0  321m 175m  944 R   168  0.1   0:33.14 bt.B
41664 uhpc   20   0  321m 175m  944 S   162  0.1   0:31.71 bt.B
41369 uhpc   20   0  321m 175m  944 S   159  0.1   0:32.39 bt.B
41215 uhpc   20   0  321m 175m  944 S   145  0.1   0:32.66 bt.B
41942 uhpc   20   0  321m 175m  944 R   144  0.1   0:29.93 bt.B
40828 uhpc   20   0  321m 175m  944 S   142  0.1   0:32.28 bt.B
42872 uhpc   20   0  321m 175m  944 R   139  0.1   0:28.68 bt.B
41552 uhpc   20   0  321m 175m  944 R   138  0.1   0:30.52 bt.B
42569 uhpc   20   0  321m 175m  944 S   137  0.1   0:33.24 bt.B
42817 uhpc   20   0  321m 175m  944 R   136  0.1   0:31.24 bt.B
42239 uhpc   20   0  321m 175m  944 R   134  0.1   0:30.50 bt.B
40535 uhpc   20   0  321m 175m  944 S   133  0.1   0:31.00 bt.B            

Many ouput files are written when the job finishes:

bt.B.e153808  bt.B.o153808  bt.B.pe153806  bt.B.pe153808  bt.B.po153806  bt.B.po153808