7. Asynchronous Job Operator (ajo)

Introduction

Asynchronous Job Operator (AJO) is a software tool designed to provide an easier and transparent method of operating with jobs in a HPC system.

It acts as a gateway between the user interface and the HPC system, and permits users to Submit, Retrieve, Query, Cancel and Erase jobs in a faster and easier way.

                                                                                 

 

Requirements

  • AJO works on GNU/Linux system.
  • Needs to install Ruby 1.9.1 or newer version.
  • Have an SSH client.

 

Installation

  • Download AJO.

            From the home page:  http://rdlab.cs.upc.edu/releases/ajo/

            From the svn repository using public_ajo as key:  

                   svn co http://svn-rdlab.lsi.upc.edu/subversion/ajo/public --username public_ajo

            From hpc.dtic cluster at /soft/public/  directory.

  • The downloaded folder will contain six files, of which the highest configuration is done in the config.rb file.

 

Configuration

Before configuring AJO it is recommended to establish a password-less SSH login connection to the cluster. This is possible through a pair of private/public keys.

  • Generates a pair of private and public key                     

ssh-keygen -t rsa

Enter passphrase (empty for no passphrase):

Enter same passphrase again:

Your identification has been saved in /homedtic/noko/.ssh/id_rsa.

Your public key has been saved in /homedtic/noko/.ssh/id_rsa.pub.

 

  • Copy the public key in the cluster login server, by running the next command from the directory where the keys have been generated, 

ssh-copy-id  -i  id_rsa.pub user@hpc.dtic

 

  • Establish connection to the cluster.

ssh hpc.dtic

You should now be able to establish a password-less connection.

 

The main configuration is now done in six options of the config.rb file.

·         SSH options.

Here needs to configure the server to which AJO is going to establish  an ssh connection in the HPC system. In the hpc.dtic.upf.edu cluster is the login server.

Also needs to configure the username, by introducing the personal user name used to access the cluster.

The SSH_CMD line will remain as default, unless there is a problem of initializing the environment variables. In this case you may add the profile directory.

 

An example of the final configuration is found below.

 SERVER = "hpc.dtic.upf.edu"

  USER = <DTIC_USERNAME> # insert your Linux username between quotes

  SSH_CMD = "/usr/bin/ssh #{USER}@#{SERVER} '. /etc/profile;'"

  AJO_DIR = `#{SSH_CMD} 'echo $HOME'`.chomp("\n") + "/.executions"

 

·         SGE options

There is no need to do any modification here. In case there is a problem in finding the SGE_ROOT, the simple quote containing the echo in the first line may be removed.

SGE_ROOT = `#{SSH_CMD} echo $SGE_ROOT`.chomp "\n"

  SGE_ARCH = `#{SSH_CMD} '#{SGE_ROOT}/util/arch'`.chomp "\n"

  SGE_UTIL_PATH = SGE_ROOT + "/bin/" + SGE_ARCH

  QSUB_CMD = SGE_UTIL_PATH + "/qsub"

  QSTAT_CMD = SGE_UTIL_PATH + "/qstat"

  QACCT_CMD = SGE_UTIL_PATH + "/qacct"

  QDEL_CMD = SGE_UTIL_PATH + "/qdel"

 

·         Encryption options

It is recommended to change the User and Server values in the second and third line by something more complex to asure a secure communication between AJO and the HPC system.

The new values may be something random and must be in quotes, as shown next.

  CIPHER_SALT = "ajo"

  CIPHER_KEY = OpenSSL::PKCS5.pbkdf2_hmac_sha1("oy1382-2487*", CIPHER_SALT, 2000, 16)

  CIPHER_IV = OpenSSL::PKCS5.pbkdf2_hmac_sha1("oyserver2487*", CIPHER_SALT, 2000, 16)

 

·         Folders and files options.

This part enables to configure the folders and files that will be handle to and downloaded from the HPC system.

-   FOLDER_ARGS is the folder or folders used by AJO to collect the jobs and scripts to run in the HPC system. The example below shows that AJO collects the jobs and scripts in the folder "ajo3" and runs them in the Cluster.

FOLDER_ARGS = {

    :ajo2 => "/homedtic/#{USER}/ajo3/",

 

-  FILE_ARGS is also used to show AJO what to handle to the cluster. This allows to select a given file o files instead of a whole folder as the previous case. 

FILE_ARGS = {

    :ajofi2 => "/homedtic/#{USER}/ajo2/Proyecto.sge",

    :ajofi3 => "/homedtic/#{USER}/ajo2/Proyecto.m",

 

- FOLDER_OUTPUT: same like FOLDER_ARG but used to recover or retrieve results instead of handling.

FOLDER_OUTPUT = {

    :fajo2 => "/job/",

 

- FILE_OUTPUT: same like FILE_ARG but used to recover or retrieve result instead of handling.

FILE_OUTPUT = {

     :file4 => "job.out",

 

-  RETRIEVE enables users to define the file to be retrieve from the remote result folder.  AJO keeps the result of any execution in std.out and std.err by default.

RETRIEVE = {

   :resultado => "std.out",

   :error => "std.err"

 

·         Commands.

Here one can define the commands that will be executed in the cluster. The format of the command is:     “command #{argument[argument_name]}”

    [

      "sh #{folder_output[:fajo2]}",

      "sh #{folder_args[:ajo2]}/Proyecto.sge",

      "sh #{file_output[:file4]}",

    ]

  end

 

·         Running AJO.

- Summit jobs:

               ajo -c config.rb –s  -library libhpc.rb 

The installation directory has to be specified, making the previous command to become 

/soft/public/ajo_v1.x/ajo -c /soft/public/ajo_v1.x/config.rb -s --library /soft/public/ajo_v1.x/libhpc.rb 

- Retrieve:

When a job is summated correctly the job ID is offered. This will be use to recover the result. 

/soft/public/ajo_v1.x/ajo -c /soft/public/ajo_v1.x/config.rb -r  ID --library /soft/public/ajo_v1.x/libhpc.rb --log-all

An example of ID is d427e4b981f87875ba9389ff261ae272a2551151f5775861064be484480e38b8eb586d026634ff1987eeada7aff6f83a

 

Here is a summary of some important options available.

--api

    Ask AJO to print only concise output.

-c FILE or --config FILE

Tells AJO which is the configuration file.

-d DIR or --retrieve-directory DIR

    Download the results of the execution to a given directory (DIR).

--download-exec-folder ID

    Downloads the folder where the job is executed at the HPC system.

-e ID or --erase

    Erase the directory associated with the job at the server.

-l or –list

    List all the identifiers available for query and retrieval.

--library FILE

    Specify libhpc location.

--log-all
    Make AJO log every step it takes while executing tasks in the ajo.log file created by default in current directory.

-q ID or --query ID
    Get the information about the status of the job with id=ID.

-s or --submit
    Submit the job you have specified in the configuration file to the cluster

-x ID or --cancel ID
    Cancel a running job.