ACCRE Home

Enabling Researcher-Driven Innovation and Exploration


Vanderbilt Home
Job Scheduler

The job scheduler software determines which processors to send each job to and when. It monitors the entire job queue, prioritizing waiting jobs based on requested versus available resources and current usage versus fairshare. Assignments of fairshare allocations for the cluster are made at the level of an ACCRE account. Multiple groups may exist under an umbrella ACCRE account; users are subsequently assigned to a group. Obtaining account fairshare is explained in Purchasing and/or Service Use Fee to Obtain Cluster Fairshare.

Moab Scheduler

The Moab scheduler works in combination with the TORQUE Resource Manager to schedule the use of compute processors running batch/interactive jobs in the ACCRE cluster environment. It attempts to update the status of the queue every 2 minutes. Parameters and policy settings can be tuned to efficiently handle a wide range of system workloads (see Getting Started on the Cluster to learn more about submitting jobs to the queue).

Moab Scheduler Limits and User Etiquette

Limits on the Number of Jobs in the Queue which are enforced by the scheduler:

  • The largest factor in determining limits on numbers of jobs is the Maximum Processor Second (MaxPS) for each account. The MaxPS is the number of processor core seconds for each account based on fairshare number times 2,592,000 seconds per month (one processor core running 24/7 for 30 days). This is best explained by an example. An account with a fairshare of five processor cores has a MaxPS of 12,960,000 seconds. This account could start five 30 day one-processor core jobs and use their entire MaxPS. Alternatively, they could start ten 15 day one-processor core jobs or five 15 day two-processor core jobs or any other combination adding up to the MaxPS for that account. The number of jobs allowed increases as the jobs are shorter. Once jobs are started, the maximum seconds in use decreases as the remaining time for the job to finish shortens. The MaxPS has several significant benefits:

    • Jobs can run for up to 30 days on the cluster so accounts are allowed to run jobs that allow them to use their entire fairshare, i.e. a fairshare based on five processor cores should allow up to 5 one-processor core jobs for the entire 30 days.
    • The MaxPS allows accounts to have burst usage onto processor cores that belong to the fairshare of other users as long as they are not having a long term negative impact on other users. The shorter the job length, the greater the burst.
    • The MaxPS encourages all users to set reasonably accurate Job Wall Clock times. If a user requests 30 days when they only need a day, their usage will be limited by MaxPS. The scheduler cannot know if a job is to complete early, so it must schedule time based on the Job Wall Clock request. [Users should also be aware that they should not set their Job Wall Clock too short since jobs are killed when they run out of Job Wall Clock time.

  • The default maximum allowed number of processors in use at any one time per group is set at 400 in any combination of single and multi-processor jobs. Depending on the account fairshare, the MaxPS may limit group to significantly less than 400 jobs. Exceptions are also made to increase this maximum to ensure fairness to groups with very large fairshare.The Prinicipal Investigator (PI) in charge of an individual account may also request upper limits on users in that account. New users will have lower job limits if they do not promptly attend the Introduction to the Cluster and Job Scheduler classes.

  • The default number of processors in any combination of jobs in the idle portion of the queue is 60 processors per group. An account with large a fairshare, however, will have higher limits on the number of processors allowed in the idle portion of the queue.

  • Since the scheduler optimizes usage by infilling short and single processor jobs, the top 8 jobs in the idle portion of the queue can reserve processors for their jobs. When 'showq -i' is executed, if there is an asterisk at the end of a job number, processors are being accrued and reserved for that job. In rare instances, the scheduler will override this reservation if necessary to improve utilization of the cluster.

Limits on the Length of Runnning Jobs:

  • The maximum allowed job length is 30 days (except when there are less than 30 days before a scheduled downtime).

  • User jobs should be at least 30 minutes, though over an hour in length is preferable (exceptions will be made for a small number of test jobs). This minimum job length is required because for each and every job there is a 4 to 5 minutes of overhead time for job staging and tear down. The 4 to 5 minutes of overhead is time that the processors remain idle and not utilized. Many short jobs results in many hours of wasted processing time (which results in wasted money).

Limits on the Memory Use of Running Jobs:

  • The resource manager automatically kills jobs that use more memory than requested.

  • Although you can request the maximum memory on any node, each node uses some memory to run the operating system. Therefore, as a rule of thumb when requesting the maximum amount of memory for a node, we recommend you specify (the node type's Maximum MB - 200MB) in your request. E. g., the bigmem nodes with 4GB, have approximately 3.8GB available for computation (4000MB 200MB = 3800MB). Learn more information about the nodes here.

  • Learn how to monitor your memory usage as part of checking the status of a submitted job.

For more information on the types of nodes available, see our detailed description of the cluster. Refer to this FAQ for more discussion of why we impose these restrictions.

Please be aware that restrictions may be placed at any time on a user account if jobs are causing any problem with the cluster hardware or are interfering with the jobs of other users. In such cases, we notify the user as soon as possible (although in extreme cases, we must sometimes kill problem jobs before we have made contact with the user). We then work with the user to monitor the progress of their jobs until they can be run normally on the system without causing problems.

Many times this means restricting an account to one running job until each new job runs without encountering problems. We will then incrementally increase the number of jobs a user can have running simultaneously, only if their jobs cause no issues at that level. Eventually, we reset the account to the maximum of 300 processors in use. The main intent of our ACCRE Cluster Computing Classes is to educate users in order to help avoid such issues. Even if you are not a new user, if you have never attended our classes we invite you to do so.


Last modified: June 30 2009 12:57:27 CST.