Why a cluster?
“The underlying physical laws necessary for the mathematical theory of a large part of physics and the whole of chemistry are thus completely known, and the difficulty is only that the exact application of these laws leads to equations much too complicated to be solvable.”
The function of the HPC (High Performance Computing) environments is to solve these too much complicated equations. Featuring hundreds, thousands, or even hundreds of thousands of processors working together, an enormous number of operations per second can be performed, taking mathematics, physics, chemistry, imaging, and many others, to the next level.
Rather than solving complex computational problems with the only help of the user's workstation, an HPC cluster brings the researcher hundreds and even thousand times the computating power of his/her own workstation.
Fig. 1. A part of the chinese Thianhe-2 storage system
How do we sort things?
As long as the cluster is a shared environment with high (but limited) resources, some kind of logical agrupation of resources and accounting mechanisms had to be setup. This is the mission of the Job Scheduler.
A job scheduler is a computer application for controlling unattended background program execution (commonly called batch processing). This way, the users send the jobs to the scheduler and the scheduler plans the execution of the user's jobs in the most efficient way. Depending on the job's demands, it will be executed one way or another.
Fair-share of resources
The goal is to ensure that the access to the resources is fairly shared between all users, and no user can monopolyze cluster usage and make the rest of the user's jobs wait for a given time. To do so, the scheduler calculates quotas and applies limits depending on several things. There are:
- Limits for user: a single user can run simultaneously a given number of jobs
- Limits for research groups: users belonging to the same research group can run simulateneously a given number of jobs
- Quota limit: every job uses resources, and resources consume quota. Once the user's quota reaches zero the user cannot run the jobs anymore. Users with higher quota have higher priority when running jobs, as less quota implies prior resource usage. Quota calculation is performed every two weeks, and prior usage on the cluster counts. See section 'Queues' for more information.