The job scheduler software determines which processors to send each job to and when. It monitors the entire job queue, prioritizing waiting jobs based on requested versus available resources and current usage versus fairshare. Assignments of fairshare allocations for the cluster are made at the level of an ACCRE account. Multiple groups may exist under an umbrella ACCRE account; users are subsequently assigned to a group. Fairshare is determined by the account buy-in level (or leasing) of cluster resources.
The Moab scheduler works in combination with the TORQUE Resource Manager to schedule the use of compute processors running batch/interactive jobs in the ACCRE cluster environment. It attempts to update the status of the queue every 2 minutes. Parameters and policy settings can be tuned to efficiently handle a wide range of system workloads (see Getting Started on the Cluster to learn more about submitting jobs to the queue).
Limits on the Number of Jobs in the Queue which are enforced by the scheduler:
-
The default maximum allowed number of processors in use at any one time per user is set at 400 in any combination of single and multi-processor jobs. Exceptions are made to increase or decrease this maximum to ensure fairness to users and groups based on their fairshare. Higher limits may be set for users who purchase very large fairshares. Smaller limits may be placed as necessary for users and/or groups based on the length of the jobs they run and the type of processors they use. The Prinicipal Investigator (PI) in charge of an individual groups may also request upper limits on users in that group. New users will have lower job limits if they do not promptly attend the Introduction to the Cluster and Job Scheduler classes.
-
The default number of processors in any combination of jobs in the idle portion of the queue is 60 processors per group. Groups with large fairshares, however, have higher limits on the number of processors allowed in the idle portion of the queue.
-
Since the scheduler optimizes usage by infilling short and single processor jobs, the top 8 jobs in the idle portion of the queue can reserve processors for their jobs. When 'showq -i' is executed, if there is an asterisk at the end of a job number, processors are being accrued and reserved for that job. In rare instances, the scheduler will override this reservation if necessary to improve utilization of the cluster. The total number of running jobs can reach ~1600 (dependent on the number of online cpu's)i when the cluster is fully utilized with single-cpu jobs. We therefore attempt to limit the number of jobs in the eligible and blocked portions of the queue to about 2200. To help maintain this, since the scheduler cannot enforce an eligible plus blocked limit, we ask users not to exceed a maximum of 300 jobs in the eligible and blocked portions of the queue. We will ask users to delete jobs from the queue if over the 300 job limit. At times when the number of jobs begins to disrupt the scheduler, we must delete jobs from the eligible and blocked portions of the queue, in which case we will inform the affected users so they know to resubmit those jobs.
Limits on the Length of Runnning Jobs:
-
The maximum allowed job length is 30 days (except when there are less than 30 days before a scheduled downtime).
-
User jobs should be at least 30 minutes, though over an hour in length is preferable (exceptions will be made for a small number of test jobs). This minimum job length is required because for each and every job there is a 4 to 5 minutes of overhead time for job staging and tear down. The 4 to 5 minutes of overhead is time that the processors remain idle and not utilized. Many short jobs results in many hours of wasted processing time (which results in wasted money).
Limits on the Memory Use of Running Jobs:
-
The resource manager automatically kills jobs that use more memory than requested.
-
Although you can request the maximum memory on any node, each node uses some memory to run the operating system. Therefore, as a rule of thumb when requesting the maximum amount of memory for a node, we recommend you specify (the node type's Maximum MB - 200MB) in your request. E. g., the bigmem nodes with 4GB, have approximately 3.8GB available for computation (4000MB 200MB = 3800MB). Learn more information about the nodes here.
-
Learn how to monitor your memory usage as part of checking the status of a submitted job.
Scheduling on Myrinet Nodes:
-
Running on Myrinet nodes is regulated differently than on other nodes. We aim to improve the scheduling of multi-processor jobs which are able to take advantage of the higher speed connectivity between these nodes in the following ways:
- Any job with walltimes over 6 days that do not specify Myrinet will automatically have nomyrinet applied to their job's node properties
- Single processor jobs which request "myrinet" will not be allowed to run.
Scheduling on bigmem Nodes:
-
Running on bigmem nodes is regulated differently than on other nodes.
- Any job with walltimes over 3 days and the memory requested is below 1.4GB will automatically have nobigmem applied to their job's node properties.
|






