4. Using the Grid Engine

4.1. Cluster queues, resources and limits

Cluster queues

A cluster queue is a resource that can handle and execute user jobs. Depending on the job's demands, the job will be executed on a given queue or another. Every queue has its own limits, behavior and default values. Currently, Snow cluster has six different queues shown on the following table:

Queue name Allowed use Comment
short.q Batch processing Intended for short time, low-cpu jobs, that must be processed and dispatched fast. 
default.q Batch processing Intended for long time, high cpu, high memory demanding jobs in amd processors.
intel.q Batch processing Intended for long time, high cpu, high memory demanding jobs in intel processors.
cuda.q Batch processing Intended for solve computationally and data-intensive problems using multicore processors GPUs.
inter.q Interactive sessions Intended to manage interactive sessions on cluster nodes. Limited resources.
all.q Batch and interactive For testing and administration purposes only.

All queues are defined with some common parameters. Unless specified otherwise, these parameters are inherited by all the jobs that run on these queues. This imposes limits, for example, on time or consumed resources for the jobs that run inside a given queue. Let's see, for example, the configuration of the queue short.q:

ijimenez@login:~$ qconf -sq short.q
qname                 short.q
hostlist              @allhosts
seq_no                1
load_thresholds       np_load_avg=1.75
suspend_thresholds    NONE
nsuspend              1
suspend_interval      00:05:00
priority              0
min_cpu_interval      00:05:00
processors            UNDEFINED
qtype                 BATCH
ckpt_list             NONE
pe_list               make smp ompi matlab
rerun                 FALSE
slots                 1,[@abudhabi=64]
tmpdir                /scratch
shell                 /bin/bash
prolog                NONE
epilog                NONE
shell_start_mode      posix_compliant
[ ... ]
terminate_method      NONE
notify                00:00:60
[ ... ]
initial_state         default
s_rt                  01:55:00
h_rt                  02:00:00
s_cpu                 127:50:00
h_cpu                 128:00:00
s_fsize               INFINITY
h_fsize               INFINITY
s_data                INFINITY
h_data                INFINITY
s_stack               INFINITY
h_stack               INFINITY
s_core                INFINITY
h_core                INFINITY
s_rss                 INFINITY
h_rss                 INFINITY
s_vmem                INFINITY
h_vmem                INFINITY
ijimenez@login:~$ 

The parameters s_rt, h_rt, s_cpu and h_cpu force all the jobs submitted to this queue to have the corresponding limits.

Cluster limits

When a given user registers a job on the scheduler, limits are applied. If the job's requeriments are higher than the available resources, the job will wait on the queue until the resources get free. But if the job's requeriments are higher than the limits, the job cannot be registered. The limits are setup at three different levels: user, research group and queue.

Cluster limits are defined as resource quotas, and are explained in the next tables:

Table 1. short.q limits

Item     Limit  Comment
Queue type Batch processing No user interactive usage allowed
Wall time 2 hours Every job can run for two hours in the cluster, no matter how many CPUs it’s going to use
CPU time 128 hours Every job can use a total time of 128 hours. That is, we allow a job to be on the system for 2 hours using all queue resources: 2 hours * 64 cores = 128 hours of calculation
Maximum user alllocatable slots   32 cores A single user can allocate up to 64 slots per job, so maximum parallelism allowed in this queue is 32
Maximum research group allocatable slots 256 Researchers on the same research group can allocate up to 256 processors on this queue

Table 2. default.q limits

Item Limit  Comment
Queue type             Batch processing     No user interactive usage allowed
Wall time Not set No time limit is set
CPU time Not set No time limit is set
Maximum user alllocatable slots   320 cores A single user can allocate up to 320 slots per job, so maximum parallelism allowed in this queue is 320
Maximum research group allocatable slots 576 cores  Researchers on the same research group can allocate up to 576 processors on this queue

Table 3. interactive.q limits

Item Limit Comment
Queue type             Interactive    User interaction managed by the scheduler
Wall time 4 hours Every user can have up to 4 hours of interactive session in the node
CPU time  64 hours Every user can have up to 64 CPU hours of interactive session in the node
Minimum slots available 16 cores In case of heavy cluster usage, al least 16 processors are always reserved for this queue.
Maximum user allocatable slots 4 cores A single user can allocate up to 4 processors on a single interactive session
Maximum research group allocatable slots  8 Researchers on the same research group can allocate up to 8 processors on this queue   


This behavior is modeled as a resource as shown below:

{
   name         maxslots
   description  "Max slots per user"
   enabled      TRUE
   limit        users {*} queues short.q to slots=32
   limit        projects {*} queues short.q to slots=256
   limit        users {*} queues default.q to slots=256
   limit        projects {*} queues default.q to slots=576
   limit        queues !inter.q to slots=688
}

4.2. Submitting basic jobs

Submitting basic jobs

The basic command to send a job is qsub. If don't specified otherwise, the scheduler is going to guess as best as it can in which queue the job is gonna be registered. 

ijimenez@login:~$ qsub sleeper.sh 
Your job 153624 ("Sleeper") has been submitted
ijimenez@login:~$ 

However, there is an exception when submitting jobs to the Intel processor nodes, because they only support the intel.q queue. Thus, it is composory to specify this argument in the qsub command format.

ijimenez@login:~$ qsub -q intel.q Tasa.sh 
Your job 673725 ("Tasa") has been submitted
ijimenez@login:~$

Once registered, the scheduler tells the job ID. Keep it handy, in case of problems you'll need this value to debug what happens.

Although you can specify the options when calling qsub, when submitting a job execution request it is strongly advised that all options be provided within a job definition file (in these examples the file will be called "job.sge"). This file will contain the command you wish to execute and any Grid Engine resource request options that you need. 

ijimenez@login:~$ vi job.sge

All Grid Engine options are preceded by the string "#$ ", and all other lines in the file are executed by the shell (the default shell is /bin/bash):

#!/bin/bash
#$ -N Test
#$ -q short.q
#$ -cwd
uname -a

The "-N" option sets the name of the job. This name is used to create the output log files for the job. We recommend using a capital letter for the job name is order to distinguish these log files from the other files in your working directory. This makes it easier to delete the log files later.

The "-q" option requests the queue in which the job should run.  The "-cwd" option instructs the Grid Engine to execute your jobs from the directory in which you submit the job (using qsub). This is recommended because it eliminates the need to "cd" to the correct sub-directory for job execution, and it ensures that you log files will be located in the same directory as your job-definition file.

Remember to set the "-q" parameter pointing to intel.q to use the Intel proceesor nodes.

#!/bin/bash
#$ -N Test
#$ -q intel.q
#$ -cwd
uname -a

We can monitor how our job is doing with the qstat command:

ijimenez@login:~$ qstat
job-ID  prior   name       user         state submit/start at     queue                          slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
153646 0.00000 Test       ijimenez     qw    12/18/2013 17:12:00                                    1       
ijimenez@login:~$ 

ijimenez@login:~$ qstat
job-ID  prior   name       user         state submit/start at     queue                          slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
 673734 0.62465 Tasa       ijimenez         r     04/15/2015 17:45:36 intel.q@node14                     1    

The qstat output shows the task ID of our job (job-IF), priority (prior) who has launched it (user), which is the state of the job (state) , when and where it has been registered (submit and queueu), and how many slots it's been using (slots). The most informatinal column is state, as it shows what is actually the job doing.

Most common states are described in the following table:

Fig 1. Description of job states

Abbreviate State Comment
qw Queue waiting The job is waiting to be assigned to a given queue
t Transferring Job is assigned to a queue and it's being transferred to one or more execution hosts.
r Running The job is running on the execution host
E Error The job has failed for some reason and it's still running. Output is being sent to the file specified with -e flag, to the default error file otherwise.
h Hold The job is being hold for some reason. Most common is to nbe waiting for another job to finish
R Restarted The job has been restarted for some reason. Most common reason are errors on the execution host, and the job is sent to another execution host to be processed again.

Job run time

Each queue has specific policies that enforce how long jobs are allowed to execute: short* queues allow up to 2 hours, medium* queues allow up to 24 hours, and long* queues have no runtime limit. When you submit a job, you are either implicitly or explicitly indicating how long the job is expected to run. The first way to to indicate the maximum runtime using the format

#!/bin/bash
#$ -N Test
#$ -cwd
#$ -l s_rt=04:30:00
#$ -l h_rt=05:00:00
uname -a

This way, the scheduler will check which one of our queues is matching the job requeriments, and will send the job to the most appropiate queue. In this case, the job is going to be processed by the default.q, as long as we're requesting a soft limit of 4:30 hours and a hard limit of 05:00 hours and it exceeds the short.q threshold:

ijimenez@login:~$ qstat
job-ID  prior   name       user         state submit/start at     queue                slots ja-task-ID
--------------------------------------------------------------------------------------------------------
153662 0.50000 Test       ijimenez     r     12/18/2013 17:51:08 default.q@node10        1  

The main resources to control job time  executions are the flags shown below:

Fig 2. Parameters to control execution time

Flag  Request Comment
-l s_rt=hh:mm:ss Sets a soft-limit walltime of hh:mm:ss  The system will send a soft SIGN if execution time limit, set to hh:mm:ss is exceeded
-l h_rt=hh:mm:ss Sets a hard-limit walltime of hh:mm:ss The execution time limit for the job will be set to hh:mm:ss. If exceeded, the job will error and die.
-l s_cpu=hh:mm:ss Sets a soft-limit CPU of hh:mm:ss The system will send a soft SIGN if CPU time limit, set to hh:mm:ss is exceeded
-l h_cpu=hh:mm:ss Sets a hard-limit CPU of hh:mm:ss The CPU time limit for the job will be set to hh:mm:ss. If exceeded, the job will error and die.
-q <queue_name> Requests an specific queue for the job Forces the scheduler to register the job on a given queue. If job requeriments does not fit the queue, the job stands on qw state forever unless deleted
  • Walltime: is the 'real time' a job is running.
  • CPU time: is the CPU time a job is running
  • If a job is not parallelized, the walltime and the CPU time are the same; but ifa given program lauches N threads, the CPU time is the real time spent in execution * N threads launched.

Redirection output and error files

By default, if not specified otherwise, the scheduler will redirect the output of anuy job you launch to a couple of files, placed on yout $HOME, called <job_name>.e<job_id> and <job_name>.o<job_id>. After a few executions and test, probably your $HOME will look like this:

ijimenez@login:~$ ls
examples             hostname.sh.o153601  Program settings  Sleeper.e153624  Sleeper.o153624  Test.e153646  Test.e153662  Test.o153648
__hostname.err       job.sge              scripts           Sleeper.e153625  Sleeper.o153625  Test.e153647  Test.o153627  Test.o153649
__hostname.out       Matlab               Sleeper.e153622   Sleeper.o153622  sleeper.sh       Test.e153648  Test.o153646  Test.o153662
hostname.sh.e153601  programari           Sleeper.e153623   Sleeper.o153623  Test.e153627     Test.e153649  Test.o153647  user-scripts
ijimenez@login:~$ 

As a general rule, you are advised to use the following flags to redirect the input and error files:

Fig 3. Redirecting output and error files

Flag  Request Comment
-e <path>/<filename> Redirect error file The system will create the given file on the path specified and will redirect the job's error file here. If name is not specified, default name will apply.
-o <path>/<filename> Redirect output file The system will create the given file on the path specified and will redirect the job's output file here. If name is not specified, default name will apply.
-cwd Change error and output to the working directory The output file and the error file will be placed in the directory from which 'qsub' is called.

Of course, we can place these option on our job definition file:

ijimenez@login:~/examples$ vi job.sge 
#!/bin/bash
#$ -N Test
#$ -cwd
#$ -l s_rt=04:30:00
#$ -l h_rt=05:00:00
#$ -e $HOME/examples/uname/error
#$ -o $HOME/examples/uname/output
uname -a

And when launching the job, we'll see the output files created at $HOME/example/uname/output. If no error is reported, an empty file will be created.

ijimenez@login:~/examples/uname$ qstat
job-ID  prior   name       user         state submit/start at     queue                          slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
153670 0.50000 Test       ijimenez     r     12/18/2013 18:19:08 default.q@node10                   1 
ijimenez@login:~/examples/uname$ cd output/
ijimenez@login:~/examples/uname/output$ cat Test.o153670
Linux node10 3.2.0-4-amd64 #1 SMP Debian 3.2.51-1 x86_64 GNU/Linux
ijimenez@login:~/examples/uname/output$ 

Sending notifications

These examples run in seconds, buy mayhaps you're running jobs and simulations for hours or days, even weeks. The cluster Snow is geared with an exim4 email system, and you can configure your jobs to send you notifications once they're finished or if something goes wrong:

Fig 4. Sending notifications

Flag  Request Comment
-M <valid_email> Activate mailing feature and emal to address <valid_email> The system will send an email to the provided email when any of the the switches provided happen
-m b Begin Mail is sent at the beginning of the job, once the job enters the state 'r'
-m e End Mail is sent at the end of the job, when the job unregisters from the scheduler and no error is reported (no 'E' state)
-m a Aborted or rescheduled Mail is sent if job is aborted or rescheduled ('E' state appears)
-m s Suspended Mail is sent if job is suspended (usually by a user with higher privileges, 's' state appears)

Of course, we'll add these options to our job defintiion and we'll run:

#!/bin/bash
#$ -N Test
#$ -cwd
#$ -l s_rt=04:30:00
#$ -l h_rt=05:00:00
#$ -e $HOME/examples/uname/error
#$ -o $HOME/examples/uname/output
#$ -M ivan.jimenez@upf.edu
#$ -m bea
uname -a
ijimenez@login:~/examples/uname$ qsub job.sge
Your job 153676 ("Test") has been submitted

Deleting and modifying jobs

We can modify the requeriments of a job while it's waiting to be processed. Once it's

Fig 4. Deleting a job

Flag  Request Comment
qdel <job_id> Delete the job The system will remove the job and all its dependencies from the queues and the execution hosts

 

4.3. Submitting advanced jobs

1. Submitting an array of jobs

2. Submitting Parallel jobs

- Open MPI


1. Submitting an array of jobs

Sometimes you'll need to submit a lot of jobs in a single script. In this case, perhaps you'll need to handle thousands of independent jobs and its own input, output and error files. As a general rule, it's not a good idea generate thousands of separate job scripts which are submitted to the cluster.

Sun Grid Engine allows users to submit a special kind of job which executes a single scripts with N different input files. This is called 'array job' and it's submitted to the cluster only once and can be managed by as a single job. 

To create an array job is slightly different than create a single job, as long as we have to manage the N different input jobs. In this example, we'll simply copying the contents of the N different input files to the output files:
 
First, we create the generic files to place the input data. As a general rule, the easiest way to name the input files is to append an integer at the end of the input file, and use the environment variable SGE_TASK_ID to handle them. Whit the following 'for' command we create ten input, empty files:
for i in {1..10}; do echo "File $i" > data-$i.log; done

Let's verify the input files:

uhpc@login:~/Source/array-example$ ls -l
total 0
-rw-r--r-- 1 uhpc info_users 0 Dec 18 17:23 data-10.log
-rw-r--r-- 1 uhpc info_users 0 Dec 18 17:23 data-1.log
-rw-r--r-- 1 uhpc info_users 0 Dec 18 17:23 data-2.log
-rw-r--r-- 1 uhpc info_users 0 Dec 18 17:23 data-3.log
-rw-r--r-- 1 uhpc info_users 0 Dec 18 17:23 data-4.log
-rw-r--r-- 1 uhpc info_users 0 Dec 18 17:23 data-5.log
-rw-r--r-- 1 uhpc info_users 0 Dec 18 17:23 data-6.log
-rw-r--r-- 1 uhpc info_users 0 Dec 18 17:23 data-7.log
-rw-r--r-- 1 uhpc info_users 0 Dec 18 17:23 data-8.log
-rw-r--r-- 1 uhpc info_users 0 Dec 18 17:23 data-9.log
When creating an array job, -t flag will handle the way the scheduler will take the input files:

Fig 1. Basic parameters for array job

Flag  Request Comment
-N <job_name> Job Name Definition
The name of the job. The name should follow the "name" definition in sge_types(1).  Invalid job names will  be  denied  at submit time.
-t  n[-m[:s]] Array index limits
The option arguments n, m and s will be  available  through  the  environment  variables              SGE_TASK_FIRST, SGE_TASK_LAST and  SGE_TASK_STEPSIZE.

In the example, we want to pick ten files, starting on 1, ending at 10, and pick them one after another. So, out -t option has to be -t 1-10:1 (start on one, end on 10, increment 1 each time):  

#!/bin/bash
#$ -N FileArray
#$ -t 1-10:1
FILESPATH=/homedtic/uhpc/Source/array-example
cat $FILESPATH/data-${SGE_TASK_ID}.log > $FILESPATH/output-${SGE_TASK_ID}.log
exit 0;

Submit array job using qsub command:

uhpc@login:~/Source/array-example$ qsub submit-array.sh
Your job-array 153660.1-10:1 ("FileArray") has been submitted
Monitor job status using qstat command, 10 jobs running on short.q queue. You can see duplicated 10 times the job-ID 153672 on different nodes in the cluster. This time, the qstat commands shows additional output at the task-ID column:
uhpc@login:~/Source/array-example$ qstat
job-ID  prior   name       user         state submit/start at     queue          slots ja-task-ID
-------------------------------------------------------------------
153672 0.50000 FileArray  uhpc         r     12/18/2013 18:30:38 short.q@node10                 1 1
153672 0.50000 FileArray  uhpc         r     12/18/2013 18:30:38 short.q@node05                 1 2
153672 0.50000 FileArray  uhpc         r     12/18/2013 18:30:38 short.q@node09                 1 3
153672 0.50000 FileArray  uhpc         r     12/18/2013 18:30:38 short.q@node03                 1 4
153672 0.50000 FileArray  uhpc         t     12/18/2013 18:30:38 short.q@node10                 1 5
153672 0.50000 FileArray  uhpc         t     12/18/2013 18:30:38 short.q@node05                 1 6
153672 0.50000 FileArray  uhpc         t     12/18/2013 18:30:38 short.q@node09                 1 7
153672 0.50000 FileArray  uhpc         t     12/18/2013 18:30:38 short.q@node03                 1 8
153672 0.50000 FileArray  uhpc         r     12/18/2013 18:30:38 short.q@node04                 1 9
153672 0.50000 FileArray  uhpc         t     12/18/2013 18:30:38 short.q@node10                1 10
Output files are stored where script file have output-* defined.
uhpc@login:~/Source/array-example$ ls -l output-*
-rw-r--r-- 1 uhpc info_users 8 Dec 18 18:30 output-10.log
-rw-r--r-- 1 uhpc info_users 7 Dec 18 18:30 output-1.log
-rw-r--r-- 1 uhpc info_users 7 Dec 18 18:30 output-2.log
-rw-r--r-- 1 uhpc info_users 7 Dec 18 18:30 output-3.log
-rw-r--r-- 1 uhpc info_users 7 Dec 18 18:30 output-4.log
-rw-r--r-- 1 uhpc info_users 7 Dec 18 18:30 output-5.log
-rw-r--r-- 1 uhpc info_users 7 Dec 18 18:30 output-6.log
-rw-r--r-- 1 uhpc info_users 7 Dec 18 18:30 output-7.log
-rw-r--r-- 1 uhpc info_users 7 Dec 18 18:30 output-8.log
-rw-r--r-- 1 uhpc info_users 7 Dec 18 18:30 output-9.log
The content of every file is the same than his parent input file, as we have specified in our script:
uhpc@login:~/Source/array-example$ cat output-1.log
File 1
uhpc@login:~/Source/array-example$ cat data-1.log
File 1
uhpc@login:~/Source/array-example$ cat output-1.log
File 1
uhpc@login:~/Source/array-example$ cat data-10.log
File 10
uhpc@login:~/Source/array-example$ cat output-10.log
File 10
To delete all jobs of an array, execute command qdel specifying the job-ID of the whole array job. It will mark all ten independent jobs for deletion:
uhpc@login:~/Source/array-example$ qdel 153677
uhpc has registered the job-array task 153677.1 for deletion
uhpc has registered the job-array task 153677.2 for deletion
uhpc has registered the job-array task 153677.3 for deletion
uhpc has registered the job-array task 153677.4 for deletion
uhpc has registered the job-array task 153677.5 for deletion
uhpc has registered the job-array task 153677.6 for deletion
uhpc has registered the job-array task 153677.7 for deletion
uhpc has registered the job-array task 153677.8 for deletion
uhpc has registered the job-array task 153677.9 for deletion
uhpc has registered the job-array task 153677.10 for deletion
To delete a single job of an array, specify the TASK_ID at the ending of the job-ID. The syntax should follow this way: qdel array-job-id <dot> element_job_id: 
uhpc@login:~/Source/array-example$ qdel 153679.1
uhpc has registered the job-array task 153679.1 for deletion

 

2. Submitting Parallel jobs
 
 
Open MPI divides resources in cluster slots, the next example show as you should execute an OMP binary in a parallel environment with Open MPI (mpiexec). The next example  show how you can submit a job with NPB performance tools.
 
Write an script for qsub command (see basic example to build it):
#!/bin/bash
#$ -pe ompi* 256
#$ -N bt.B
export OMP_NUM_THREADS=256
/soft/openmpi/openmpi-1.6.5/bin/mpiexec  /PATH/TO/BIN/NPB3.2/NPB3.2-OMP/bin/bt.B

Fig 2. Parameters Open MPI Script

Flag  Request Comment
-N <job_name> Job Name Definition The name of the job. The name should follow the "name" definition in sge_types(1).  Invalid job names will  be  denied  at submit time.
-pe ompi* <slots> Parallel enviroment 

slots = set integer number if you submit an MPI job that requests a lot of slots (ex. more than 16).

You can ask to SGE how many slots are available by using a range descriptor

#$ -pe ompi* 8,16 (schedule when the system can provide either 16 or 8 slots)
#$ -pe ompi* 4-16 (requests anywhere from 4 to 16 slots on a single host)

 
In this example bt.B is running with 256 slots on node02:
$ qsub pe-qsub.sh
Your job 153804 ("bt.B") has been submitted

You can see the number of slots using qstat command:

qstat
job-ID  prior   name       user         state submit/start at     queue          slots ja-task-ID
--------------------------------------------------------------------
153804 0.52500 bt.B       uhpc      r     12/18/2013 20:16:40 default.q@node02              256       

Node02 has 256 processes running  for bt.B program.

PID USER      PR  NI  VIRT  RES  SHR S  %CPU %MEM    TIME+ COMMAND
41020 uhpc   20   0  321m 175m  944 R   168  0.1   0:33.14 bt.B
41664 uhpc   20   0  321m 175m  944 S   162  0.1   0:31.71 bt.B
41369 uhpc   20   0  321m 175m  944 S   159  0.1   0:32.39 bt.B
41215 uhpc   20   0  321m 175m  944 S   145  0.1   0:32.66 bt.B
41942 uhpc   20   0  321m 175m  944 R   144  0.1   0:29.93 bt.B
40828 uhpc   20   0  321m 175m  944 S   142  0.1   0:32.28 bt.B
42872 uhpc   20   0  321m 175m  944 R   139  0.1   0:28.68 bt.B
41552 uhpc   20   0  321m 175m  944 R   138  0.1   0:30.52 bt.B
42569 uhpc   20   0  321m 175m  944 S   137  0.1   0:33.24 bt.B
42817 uhpc   20   0  321m 175m  944 R   136  0.1   0:31.24 bt.B
42239 uhpc   20   0  321m 175m  944 R   134  0.1   0:30.50 bt.B
40535 uhpc   20   0  321m 175m  944 S   133  0.1   0:31.00 bt.B            

Many ouput files are written when the job finishes:

bt.B.e153808  bt.B.o153808  bt.B.pe153806  bt.B.pe153808  bt.B.po153806  bt.B.po153808

 

 

4.4. Submitting interactive jobs

Submitting interactive jobs

Sometimes users need more computational power to run commands from the command line. As a general rule, it is NOT a good idea to launch these commands from the login node, because the login node is a server designed to manage user sessions, it has limited resources and it does not behave well with CPU extensive sessions. Additionally, intensive CPU usage on the login node could reduce session speed and performance for the rest of the users. 

To solve this need, users can request an interactive job on a computing node with the qrsh command. Interactive jobs are only allowed on a single queue. qrsh behaves the same way as the rest of the scheduler commands: it inherits values from the queue where the job runs, but also user options can be specified.

Fig 1. inter.q values

Item Limit  Comment
Queue type             Interactive    User interaction managed by the scheduler.
Wall time 4 hours Every user can user can have up to 4 hours of interactive session in the node.
CPU time  64 hours Every user can user can have up to 64 CPU hours of interactive session in the node.
Architectures AMD/INTEL Node 11 for AMD architecture, Node 12 for INTEL architecture. 6 slots for each host.
Minimum slots available   12 cores

In case of heavy cluster usage, 16 processors are always reserved for this queue.

Maximum user allocatable slots 4 cores A single user cal allocate up to 4 processors on a single interactive session.
Maximum research group allocatable slots  8 Researchers on the same research group can allocate up to 576 processors on this queue. 
 
ijimenez@login:~$ qrsh< Linux node09 3.2.0-4-amd64 #1 SMP Debian 3.2.51-1 x86_64   The programs included with the Debian GNU/Linux system are free software; the exact distribution terms for each program are described in the individual files in /usr/share/doc/*/copyright.   Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent permitted by applicable law. Last login: Fri Dec 13 13:31:31 2013 from 192.168.7.6 ijimenez@node09:~$ 

The following command allows us to use an specific node: qrsh -l h='node11' -now no

This way, I request an interactive job inheriting the inter.q parameters and limits. As long as the session is, internally, a job registered on a queue, the qstat command will show a running job:

ijimenez@node10:~$ qstat 
job-ID prior name user state submit/start at queue slots ja-task-ID 
----------------------------------------------------------------------------------------------------------------- 
153890 0.47500 QRLOGIN ijimenez r 12/27/2013 09:31:40 inter.q@node10 1 
ijimenez@node10:~

The common qsub modifiers can be used when calling qrsh. Additionally, if the connection to the login node was configured to handle X graphics, the qrsh command on the nodes will also forward the graphical commands. Most common usage is to call Matlab within a node to take advantage of the higher amount of memory available:

 

4.5. Monitoring

Monitoring jobs

We can monitor our jobs with the qstat command. If we call qstat without arguments, it will show the state of the jobs for the current user:

ijimenez@login:~/Matlab/Mandelbrot$ qstat
job-ID  prior   name       user         state submit/start at     queue                          slots ja-task-ID 
-----------------------------------------------------------------------------------------------------------------
 154084 0.47598 Mandelbrot ijimenez     r     01/07/2014 15:37:59 short.q@node11                     6        
ijimenez@login:~/Matlab/Mandelbrot$ 
 
Next table shows the output for qstat command:
 

Fig 1. qstat information

Field Description Comment
job-ID Numerical ID of the job Numerical identifier of the job 
prior Priority Priority of the job, from lowest (-20) to highest (20). Similar to Linux 'nice' command, is used for the cluster administrators to priorize jobs on a given queue
name Name of job owner The name of the user who owns the job
state Job status Job status. Available states are shown Table 3.
submit/start at Start time Date and time when the scheduler has registered the job. To register the job does not imply to execute it. Once the job is registered, it must wait for the selected resources to be free.
queue Queue Queue where the job runs. 
slots Slots Number of processors used by the job
ja-task-ID Parallel job Task ID Task-ID. Only shown in the case of array jobs,
 
 

Fig 2. Description of job states

Abbreviate State Comment
qw Queue waiting The job is waiting to be assigned to a given queue
t Transferring Job is assigned to a queue and it's being transferred to one or more execution hosts.
r Running The job is running on the execution host
E Error The job has failed for some reason and it's still running. Output is being sent to the file specified with -e flag, to the default error file otherwise.
h Hold The job is being hold for some reason. Most common is to nbe waiting for another job to finish
R Restarted The job has been restarted for some reason. Most common reason are errors on the execution host, and the job is sent to another
 
The qstat command accepts several arguments:
 
Fig 3 qstat command options
 
Command Option Comment
-f Show full output Show full information of all used slots
-F  Show full output Show complete information of all used slots
-u <user> Show jobs for a given user Show the jobs for a given users
-j <job_id> Show schedule information Shows scheduling options for a given job_id

 

 

4.6. Submitting cuda jobs

In this section, we have submitted a basic job through the cuda queue using the qsub command.

Cuda is a parallel computing language and programming model invented by NVIDIA. It enables dramatic improvements in computing performance by harnessing the power of the Graphics Processing Unit (GPU). 

First of all, we have created the file as we have explained before, all Grid Engine options are preceded by the string #$, and all other lines in the file are executed by the shell (the default shell is /bin/bash):

#!/bin/bash

#$ -N CudaSample
#$ -l gpu=1
#$ -q default.q
#$ -e $HOME/logs/$JOB_NAME-$JOB_ID.err
#$ -o $HOME/logs/$JOB_NAME-$JOB_ID.out
module load cuda/7.5
/soft/cuda/NVIDIA_CUDA-7.5_Samples/0_Simple/vectorAdd/vectorAdd

IMPORTANT: This qsub sample script request a cuda resource, this means that your job will be executed in a single cuda card(all cuda cores of this card will be available for your job). Please do not request more than one gpu, cuda hosts only own one physical cuda card; if you change it, your job will be waiting forever.

Script options:

-N This script is called MyFirstJob2Cuda:

-l Resource type.

-q Parameter is set to cuda.q to use the multicore processors GPUs.

-e error files are created in the cuda/error

-o output files are created in the cuda/output and cuda/error directories.

 

 

 

 

 

 

 

To send the job through the cuda queue, it is necessary to load the module cuda/7.5.

module load cuda/7.5

This script executes a sample file.

/soft/cuda/NVIDIA_CUDA-7.5_Samples/0_Simple/vectorAdd/vectorAdd.

If you want to do some tests, in the following path you can find other samples to use:

/soft/cuda/NVIDIA_CUDA-7.5_Samples/

When the file is saved, we create the folders:

Finally, we launch the job and we monitor it with ‘qstat’ command:

In this case, we have obtained a success execution, you can check the output directory to see the results:

 

4.7. Select Debian 8 as resource

Compute nodes are currently under operating system upgrade process from Debian 7.9 to Debian 8.6; all nodes are in the same cluster and compute queues of the Open Grid Scheduler. 
 
Main differences and news:
  • - Last stable release of Debian 8 Release 6 (September 17th, 2016).
  • - Open Grid Scheduler integration.
  • - Same user homes for all nodes. 
  • - New software repository compiled on demand for HPC cluster.
    • - Python 3.5.3 (SimpleITK, tensorflow, theano, scipy, lmdb, pandas, ...)
    • - GCC compiler 6.2
    • - CUDA toolkit 8.0.4
    • - OpenMPI 2.0.1
    • - lmod 5.1.5
    • - 3D-Caffe
    • [If you need more software, just request it by CAU]
 
By default this systems cannot be selected, unless otherwise is stated; this guideline provide you the instructions to connect and compute in Debian 8.
 
Debian 8 is a resource of the cluster named as 'debian8' or shortcut 'd8', to use that it must be specified ondemand. 
 
Some common resources of the cluster:
- cpu
​- slots
- gpu
- mem_used and mem_free
- h_cpu and h_vmem
- debian8

Recomendation: First of all you should keep in mind if you have compiled software in your home, compiled software in Debian 8 might not work in older releases. Our recomendation is execute in the system that you compiled your software, if not, you must recompile.


* Interactive session:

To request an interactive session, first of all you must be logged in login node. Then execute next command with 'd8' resource:
<your_user>@login:~$ qrsh -l d8

The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.
Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
Last login: Wed Jan 18 09:08:25 2017 from 192.168.7.6
<your_user>@node11:~$
Grid Scheduler avaluate your needs and select a node with this resource (interactive tasks can be checked as usual):
<your_user>@node11:~$ qstat

job-ID  prior   name       user         state submit/start at     queue             slots ja-task-ID 
----------------------------------------------------------------------------------------------
3832099 0.44872 QRLOGIN    <your_user>      r     01/18/2017 09:28:10 inter.q@node11               1        

* Batch session:

An example of file script to request (for example) gpu, debian8 and default.q queue:
 
#!/bin/bash
#$ -N CudaSample
#$ -l gpu=1
#$ -l debian8
#$ -q default.q
#$ -e $HOME/logs/$JOB_NAME-$JOB_ID.err
#$ -o $HOME/logs/$JOB_NAME-$JOB_ID.out

module load cuda/8.0
/soft/cuda/NVIDIA_CUDA-8.0_Samples/0_Simple/vectorAdd/vectorAdd
hostname
 
Grid Scheduler evaluate your needs and select a node to compute your batch job:
<your_user>@login:~/Scripts$ qstat

job-ID  prior   name       user         state submit/start at     queue            slots ja-task-ID 
---------------------------------------------------------------------------------------------------
3832177 0.30833 CudaSample <your_user>      r     01/18/2017 09:35:25 default.q@node11            1       
Result of the job executed:
<your_user>@login:~/Scripts$ cat ../logs/CudaSample-3832177.*

[Vector addition of 50000 elements]
Copy input data from the host memory to the CUDA device
CUDA kernel launch with 196 blocks of 256 threads
Copy output data from the CUDA device to the host memory
Test PASSED
Done
node11