5. Working with Matlab and SGE

5.1.Running basic Matlab jobs over SGE

MATLAB is a high-level language and interactive environment for numerical computation, visualization, and programming. Using MATLAB, you can analyze data, develop algorithms, and create models and applications. The language, tools, and built-in math functions enable you to explore multiple approaches and reach a solution faster than with spreadsheets or traditional programming languages, such as C/C++ or Java.

Before using Matlab, you should check if required environment variables are properly set. If they are not, set them following these steps:

Export Matlab variables on your user profile (you can define it on your .basrc profile):

export MATLAB_JAVA=/soft/jdk1.7.0_40/jre/
export PATH=$PATH:/soft/MATLAB/R2013b/bin
export OLDPWD=/soft/MATLAB/R2013b/sys/jxbrowser/glnxa64/xulrunner/xulrunner-linux-64

When setting environment vatiables in .bashrc, a new login event is required to have them processed. Logout, login again and check if envirionment variables are correclty set. The output shoud be as follows:

ijimenez@login:~$ which matlab
/soft/MATLAB/R2013b/bin/matlab
ijimenez@login:~$ echo $MATLAB_JAVA
ijimenez@login:~$ echo $OLDPWD
/soft/MATLAB/R2013b/sys/jxbrowser/glnxa64/xulrunner/xulrunner-linux-64<
ijimenez@login:~$ 

Once done, connect to a node with 'qrsh' command. You'll be redirected to a free computing node where you will run Matlab:

uhpc@login:~$ qrsh
Warning: Permanently added '[node08]:55494,[192.168.7.108]:55494' (ECDSA) to the list of known hosts.
Linux node08 3.2.0-4-amd64 #1 SMP Debian 3.2.51-1 x86_64
The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.
Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
uhpc@node08:~$ 

Running interactive Matlab sessions

Following is the easiest way of running Matlab in the cluster. This method should be only used to testing previously, outside-developed code purposes. Developing code directly on the cluster is not recommended and shoul be avoided as a general rule.

From the node, you can choose to run Matlab with or without graphical support (default is to have GUI). Execute Matlab GUI console using 'matlab &' command on your shell:

Your graphical pannel looks like this:

On the left panel you'll see your $HOME files. You can run an script from matlab gui or drag and drop example.m script to Command Window.

Run command is executed automatically:

If oyu don't need graphical Matlab support you can call Matlab with the -nosplash -nodesktop modifiers:

ijimenez@node06:~$ matlab -nosplash -nodesktop
                             < M A T L A B (R) >
                  Copyright 1984-2013 The MathWorks, Inc.
                   R2013b (8.2.0.701) 64-bit (glnxa64)
                               August 13, 2013
To get started, type one of these: helpwin, helpdesk, or demo.
For product information, visit www.mathworks.com.

>> 1+1
ans =      2 >> 

Fig 1. Matlab runtime modifiers

Flag  Request Comment
-help Help Shows Matlab help
-e Display environment variables Display ALL environment variables. If the status return is not 0, some corrective actions may be needed. Matlab is not run.
-n Display diagnostics Display environment variables, libraries, arguments and other diagnostic information to the standard output. Matlab is not run
-arch Architecture request Start Matlab assuming processor architecture arch.
-nodisplay Disable X Do not display any X command. Matlab's JVM is started
-nojvm Disable Matlab JVM Do not start Matlab's JAva virtual machine. Any Matlab extension depending on JVM will not run
-r <file.m> Run file inmediately Start Matlab and run file.m inmediately after
-logfile file.log Send output to file Make a copy of all output to file.log. This include crash reports.

To send Matlab jobs in the background, a combination of Matlab modifiers and SGE options will be used. In the next steps we'll cover how to send a Matlab job to a SGE queue:

1. Write your matlab source in a file with .m extension, the next is named as 'example.m'

x = [1 2 3 4];
fprintf('Example number = %i\n', x)

2. Write a submit job script named as 'matlab-job.sh'. Place it wherever on your home directory, and specify here the desired options.  

#!/bin/sh
#$ -N MatlabJob
#$ -cwd
#$ -o matlab.$JOB_ID.out
#$ -e matlab.$JOB_ID.err
/soft/MATLAB/R2013b/bin/matlab -nojvm -nodisplay -r "example;quit;"

Fig 2. Parameters used in matlab-job.sh

Flag  Request Comment
-N <job_name> Job Name Definition The name of the job. The name should follow the "name" definition in sge_types(1).  Invalid job names will  be  denied  at submit time.
-cwd Current working directory Use current working directory.
-o <file_name> Output file path Standard output to file path identified by JOB_ID.
-e <file_name> Error file path Error output to file path identified by JOB_ID.
-r <comand1;commandN> Command run Execute comands separated by ';'. Note that file extension is not defined.
-nojvm JVM disabled Does not start the JVM software and uses current window. Graphics will not work without the JVM.
-nodisplay Display disabled Also starts the JVM and does not start the desktop in addition it also ignores Commands and the DISPLAY environment variable in LINUX.
-r  Run inmediately Start Matlab and run file.m inmediately after
 

3. Submit your matlab job:

uhpc@login:~$ qsub matlab-job.sh
Your job 153873 ("MatlabJob") has been submitted

4. Monitor the job status:

uhpc@login:~$ qstat
job-ID  prior   name       user         state submit/start at     queue           slots ja-task-ID
----------------------------------------------------------------------------------------
153873 0.00000 MatlabJob  uhpc         qw    12/20/2013 16:41:34                             1       

5. When the job is finished, the output files are created in your Current Working Directory:

-rw-r--r-- 1 uhpc info_users    0 Dec 20 16:16 matlab.153873.err
-rw-r--r-- 1 uhpc info_users  403 Dec 20 16:16 matlab.153873.out

6. The standard output results of Matlab, file with stderr is empty(job without errors):

uhpc@login:~$ cat matlab.153867.out
                            < M A T L A B (R) >
                  Copyright 1984-2013 The MathWorks, Inc.
                    R2013b (8.2.0.701) 64-bit (glnxa64)
                              August 13, 2013
To get started, type one of these: helpwin, helpdesk, or demo.
For product information, visit www.mathworks.com.
Example number = 1
Example number = 2
Example number = 3
Example number = 4

5.2. Running Matlab advanced jobs over SGE

To achieve better results while running simulations on the cluster, some best practices should be followed. We'll show in this section some tips and techniques to speed up simulations and efficiently use the cluster.

Keep your data organized

As a general rule, plain directories (single folders containing hundreds or thousands of files) should be avoided, as it has performance impact on the NFS filer. An strategy is to create a folder for each experiment you do and, inside this folder, place the directories you need. 

Use the SSD drives when possible

The nodes are geared with SSD drives. The performance of your job will improve if you copy the data to the SSD of the node, process there, and send the results back rather to using always the NFS.

Paralellize when possible

Whenever possible, use SMP multi-processing, parallel loops or other techniques to achieve results faster

Clean the scratch

If you use the SSD drives as scratch, remember to clean the data you don't use after the job is done

In the next section, we'll show two different simulations and how to reach better results: 

Running in serial:

Monte Carlo methods (or Monte Carlo experiments) are a broad class of computational algorithms that rely on repeated random sampling to obtain numerical results, and fit perfectly to illustrate the serial vs. parallel method:

1. Create the folder where the experiment data will be placed:

ijimenez@login:~/Matlab$ mkdir MonteCarlo/
ijimenez@login:~/Matlab$ cd MonteCarlo/
ijimenez@login:~/Matlab$ mkdir data
ijimenez@login:~/Matlab$ mkdir out
ijimenez@login:~/Matlab$ mkdir script
ijimenez@login:~/Matlab$ mkdir job-out

2. Place the Matlab jobs in ./script:

ijimenez@login:~/Matlab$ cd script/
ijimenez@login:~/script$ vi montecarlo.m/
%% SERIAL DEMO

%% Init problem
iter = 100000;
sz = 55;
a = zeros(1, iter);

% MonteCarlo simulations
disp('Starting ...');
tic;
for simNum = 1:iter
a(simNum)= myFunction(sz);
end
toc;
 

We show also the code for myFunction:

ijimenez@login:~/script$ vi myFunction.m/

function out = myFunction(in)
        out=max(svd(rand(in)));

3. We'll write down a more complex submission script. Submission code is documented:

#!/bin/bash # Load modules directive
. /etc/profile.d/modules.s

# Copy sources to the SSD:

# First, make sure to delete previous versions of the sources:
# ------------------------------------------------------------
if [ -d /scratch/MonteCarlo ]; then
        rm -Rf /scratch/MonteCarlo
fi

# Second, replicate the structure of the experiment's folder:
# -----------------------------------------------------------
mkdir /scratch/MonteCarlo
mkdir /scratch/MonteCarlo/data
mkdir /scratch/MonteCarlo/error
mkdir /scratch/MonteCarlo/script
mkdir /scratch/MonteCarlo/out

# Third, copy the experiment's data:
# ----------------------------------
cp -rp /homedtic/ijimenez/Matlab/MonteCarlo/data/* /scratch/MonteCarlo/data
cp -rp /homedtic/ijimenez/Matlab/MonteCarlo/script/* /scratch/MonteCarlo/script

# Fourth, prepare the submission parameters:
# Remember SGE options are marked up with '#$':
# ---------------------------------------------
# Requested resources:
#
# Simulation name
# ----------------
#$ -N "MonteCarlo-serial"
#
# Expected walltime: five minutes maximum
# ---------------------------------------
#$ -l h_rt=00:05:00
#
# Shell
# -----
#$ -S /bin/bash
#
# Output and error files go on the user's home:
# -------------------------------------------------
#$ -o /homedtic/ijimenez/Matlab/MonteCarlo/job-out/montecarlo-serial.out
#$ -e /homedtic/ijimenez/Matlab/MonteCarlo/job-out/montecarlo-serial.err
#
# Send me a mail when processed and when finished:
# ------------------------------------------------
#$ -m bea
#$ -M  my.email@upf.edu
#

# Start script
# --------------------------------
#
printf "Starting execution of job $JOB_ID from user $SGE_O_LOGNAME\n"
printf "Starting at `date`\n"
printf "Calling Matlab now\n"
printf "---------------------------\n"
# Execute the script
/soft/MATLAB/R2013b/bin/matlab -nosplash -nojvm -nodesktop -r "run /scratch/MonteCarlo/script/montecarlo.m"
# Copy data back, if any
printf "---------------------------\n"
printf "Matlab processing done. Moving data back\n"
# cp -rf /scratch/MonteCarlo/out/montecarlo.out /homedtic/ijimenez/Matlab/MonteCarlo/out
printf "Job done. Ending at `date`\n"

4. We have the elapsed time in the contents of the .out file in the job-out folder. Running this simulation in serial has taken nearly a minute:

Starting execution of job 153972 from user ijimenez
Starting at Tue Dec 31 10:06:53 CET 2013
Calling Matlab now
Warning: No display specified.  You will not be able to display graphics on the screen.
                            < M A T L A B (R) >
                  Copyright 1984-2013 The MathWorks, Inc.
                    R2013b (8.2.0.701) 64-bit (glnxa64)
                              August 13, 2013

To get started, type one of these: helpwin, helpdesk, or demo.
For product information, visit www.mathworks.com.

Starting ...
Elapsed time is 63.783560 seconds.
>> Matlab processing done. Moving data back
Job done. Ending at Tue Dec 31 10:07:59 CET 2013

Running in parallel:

Matlab Parallel Computing Toolbox product offers several features that simplify the development of parallel applications in MATLAB. It offers programming constructs such as parallel loops and distributed arrays that let you extend your serial programming into a parallel domain. You can use these constructs without the requirement of learning a complex parallel language or making significant changes to your existing code. The toolbox supports interactive development, which lets you connect to your cluster from a MATLAB session to interactively perform parallel computations or use them in batch. Currently, the Paralell Computing Toolbox license is limited to 12 local cores per worker. 

Let's run the same MonteCarlo simulation taking advantage of the paralellization. To do this, we must do two things:

a. Reserve a pool of N workers to register the job on SGE (where N is 12 at the maximum)

b. Tell Matlab to use the local pool of workers

1. Let's modify our Matlab script. We'll use the 'parfor' instead of 'for'. We can do it as long as the inner operations on the loop does not depend on the loop iterator, and we'll also use the directive parpool to create the pool using the profile 'local' with 6 slots. More information on Matlab cluster profiles here:

 
%% PARFOR DEMO
 
%% Init problem
% Init pe
parpool('local',6)
 
% MonteCarlo simulations
disp('Starting ...')
tic;
iter    = 100000;
sz      = 55;
a       = zeros(1, iter);
parfor (simNum = 1:iter, 6)
        a(simNum)= myFunction(sz);
end
toc;
delete(gcp)

2. Let's modify also our submission script. We'll include the -pe option to reserve as many slots as we have specified in the previous parfor command. We'll also modify the lines to call the job and the output and error files:

[...]
# Requested resources:
#
# Simulation name
# ----------------
#$ -N "MonteCarlo-paralell"
#
# Paralell environment: we'll use six cores
# on the paralell environment called 'smp'
# -----------------------------------------
#$ -pe smp 6
[...]
# Output and error files go on the user's home:
# -------------------------------------------------
#$ -o /homedtic/ijimenez/Matlab/MonteCarlo/job-out/montecarlo-paralell.out
#$ -e /homedtic/ijimenez/Matlab/MonteCarlo/job-out/montecarlo-paralell.err
[...]
This time, if we check the speedup achieved shown by the output files, we'll see that serial took nearly a minute, and paralell took 13 seconds. This shows a speedup of 1:6:
 
Serial: Elapsed time is 63.909396 seconds.

Paralell:Elapsed time is 13.476630 seconds.