5.2. Running Matlab advanced jobs over SGE

To achieve better results while running simulations on the cluster, some best practices should be followed. We'll show in this section some tips and techniques to speed up simulations and efficiently use the cluster.

Keep your data organized

As a general rule, plain directories (single folders containing hundreds or thousands of files) should be avoided, as it has performance impact on the NFS filer. An strategy is to create a folder for each experiment you do and, inside this folder, place the directories you need. 

Use the SSD drives when possible

The nodes are geared with SSD drives. The performance of your job will improve if you copy the data to the SSD of the node, process there, and send the results back rather to using always the NFS.

Paralellize when possible

Whenever possible, use SMP multi-processing, parallel loops or other techniques to achieve results faster

Clean the scratch

If you use the SSD drives as scratch, remember to clean the data you don't use after the job is done

In the next section, we'll show two different simulations and how to reach better results: 

Running in serial:

Monte Carlo methods (or Monte Carlo experiments) are a broad class of computational algorithms that rely on repeated random sampling to obtain numerical results, and fit perfectly to illustrate the serial vs. parallel method:

1. Create the folder where the experiment data will be placed:

ijimenez@login:~/Matlab$ mkdir MonteCarlo/
ijimenez@login:~/Matlab$ cd MonteCarlo/
ijimenez@login:~/Matlab$ mkdir data
ijimenez@login:~/Matlab$ mkdir out
ijimenez@login:~/Matlab$ mkdir script
ijimenez@login:~/Matlab$ mkdir job-out

2. Place the Matlab jobs in ./script:

ijimenez@login:~/Matlab$ cd script/
ijimenez@login:~/script$ vi montecarlo.m/
%% SERIAL DEMO

%% Init problem
iter = 100000;
sz = 55;
a = zeros(1, iter);

% MonteCarlo simulations
disp('Starting ...');
tic;
for simNum = 1:iter
a(simNum)= myFunction(sz);
end
toc;
 

We show also the code for myFunction:

ijimenez@login:~/script$ vi myFunction.m/

function out = myFunction(in)
        out=max(svd(rand(in)));

3. We'll write down a more complex submission script. Submission code is documented:

#!/bin/bash # Load modules directive
. /etc/profile.d/modules.s

# Copy sources to the SSD:

# First, make sure to delete previous versions of the sources:
# ------------------------------------------------------------
if [ -d /scratch/MonteCarlo ]; then
        rm -Rf /scratch/MonteCarlo
fi

# Second, replicate the structure of the experiment's folder:
# -----------------------------------------------------------
mkdir /scratch/MonteCarlo
mkdir /scratch/MonteCarlo/data
mkdir /scratch/MonteCarlo/error
mkdir /scratch/MonteCarlo/script
mkdir /scratch/MonteCarlo/out

# Third, copy the experiment's data:
# ----------------------------------
cp -rp /homedtic/ijimenez/Matlab/MonteCarlo/data/* /scratch/MonteCarlo/data
cp -rp /homedtic/ijimenez/Matlab/MonteCarlo/script/* /scratch/MonteCarlo/script

# Fourth, prepare the submission parameters:
# Remember SGE options are marked up with '#$':
# ---------------------------------------------
# Requested resources:
#
# Simulation name
# ----------------
#$ -N "MonteCarlo-serial"
#
# Expected walltime: five minutes maximum
# ---------------------------------------
#$ -l h_rt=00:05:00
#
# Shell
# -----
#$ -S /bin/bash
#
# Output and error files go on the user's home:
# -------------------------------------------------
#$ -o /homedtic/ijimenez/Matlab/MonteCarlo/job-out/montecarlo-serial.out
#$ -e /homedtic/ijimenez/Matlab/MonteCarlo/job-out/montecarlo-serial.err
#
# Send me a mail when processed and when finished:
# ------------------------------------------------
#$ -m bea
#$ -M  my.email@upf.edu
#

# Start script
# --------------------------------
#
printf "Starting execution of job $JOB_ID from user $SGE_O_LOGNAME\n"
printf "Starting at `date`\n"
printf "Calling Matlab now\n"
printf "---------------------------\n"
# Execute the script
/soft/MATLAB/R2013b/bin/matlab -nosplash -nojvm -nodesktop -r "run /scratch/MonteCarlo/script/montecarlo.m"
# Copy data back, if any
printf "---------------------------\n"
printf "Matlab processing done. Moving data back\n"
# cp -rf /scratch/MonteCarlo/out/montecarlo.out /homedtic/ijimenez/Matlab/MonteCarlo/out
printf "Job done. Ending at `date`\n"

4. We have the elapsed time in the contents of the .out file in the job-out folder. Running this simulation in serial has taken nearly a minute:

Starting execution of job 153972 from user ijimenez
Starting at Tue Dec 31 10:06:53 CET 2013
Calling Matlab now
Warning: No display specified.  You will not be able to display graphics on the screen.
                            < M A T L A B (R) >
                  Copyright 1984-2013 The MathWorks, Inc.
                    R2013b (8.2.0.701) 64-bit (glnxa64)
                              August 13, 2013

To get started, type one of these: helpwin, helpdesk, or demo.
For product information, visit www.mathworks.com.

Starting ...
Elapsed time is 63.783560 seconds.
>> Matlab processing done. Moving data back
Job done. Ending at Tue Dec 31 10:07:59 CET 2013

Running in parallel:

Matlab Parallel Computing Toolbox product offers several features that simplify the development of parallel applications in MATLAB. It offers programming constructs such as parallel loops and distributed arrays that let you extend your serial programming into a parallel domain. You can use these constructs without the requirement of learning a complex parallel language or making significant changes to your existing code. The toolbox supports interactive development, which lets you connect to your cluster from a MATLAB session to interactively perform parallel computations or use them in batch. Currently, the Paralell Computing Toolbox license is limited to 12 local cores per worker. 

Let's run the same MonteCarlo simulation taking advantage of the paralellization. To do this, we must do two things:

a. Reserve a pool of N workers to register the job on SGE (where N is 12 at the maximum)

b. Tell Matlab to use the local pool of workers

1. Let's modify our Matlab script. We'll use the 'parfor' instead of 'for'. We can do it as long as the inner operations on the loop does not depend on the loop iterator, and we'll also use the directive parpool to create the pool using the profile 'local' with 6 slots. More information on Matlab cluster profiles here:

 
%% PARFOR DEMO
 
%% Init problem
% Init pe
parpool('local',6)
 
% MonteCarlo simulations
disp('Starting ...')
tic;
iter    = 100000;
sz      = 55;
a       = zeros(1, iter);
parfor (simNum = 1:iter, 6)
        a(simNum)= myFunction(sz);
end
toc;
delete(gcp)

2. Let's modify also our submission script. We'll include the -pe option to reserve as many slots as we have specified in the previous parfor command. We'll also modify the lines to call the job and the output and error files:

[...]
# Requested resources:
#
# Simulation name
# ----------------
#$ -N "MonteCarlo-paralell"
#
# Paralell environment: we'll use six cores
# on the paralell environment called 'smp'
# -----------------------------------------
#$ -pe smp 6
[...]
# Output and error files go on the user's home:
# -------------------------------------------------
#$ -o /homedtic/ijimenez/Matlab/MonteCarlo/job-out/montecarlo-paralell.out
#$ -e /homedtic/ijimenez/Matlab/MonteCarlo/job-out/montecarlo-paralell.err
[...]
This time, if we check the speedup achieved shown by the output files, we'll see that serial took nearly a minute, and paralell took 13 seconds. This shows a speedup of 1:6:
 
Serial: Elapsed time is 63.909396 seconds.

Paralell:Elapsed time is 13.476630 seconds.