ACCRE Home

Enabling Researcher-Driven Innovation and Exploration


Vanderbilt Home
How to Submit Basic Jobs

Remember, to continue using the cluster new users must attend our training workshops which cover in more depth these introductory topics on the use of the cluster. In particular, we invite you to print out our training presentations to keep as a desktop references:
Introduction to the Unix/Linux
Introduction to the Compute Cluster
Job Scheduler Details

To execute a program/job on the cluster, follow these steps:

  1. Log onto the cluster gateway server (vmplogin.accre.vanderbilt.edu). We do allow you to run very short, <15 minute, test jobs on the gateway machines, as long as they do not slow the gateway for other users. Anything longer than this should be submitted as a job to the compute nodes using qsub, as below.

  2. Learn about the job scheduler software. The Moab scheduler works in combination with the TORQUE Resource Manager to schedule the use of compute processors on the cluster. For more details read the subsequent sections here and on checking the status of submitted jobs and checking your usage. Read also our FAQ and our job scheduler policies. Many TORQUE and Moab commands have online manual pages. For more information on qsub and PBS scripts beyond the contents of this page:

    man qsub

    man pbs_resources

  3. Create a PBS submission script. PBS commands allow you to specify your job's resource needs (e.g., type of processor, how many processors, how much memory on each processor, the location and name to be used for job output, notification of job completion).

    You want to attempt to tune these parameters to be close to the true job requirements for two main reasons:

    • First, requesting more resources than your jobs need may delay the start time of your jobs, e. g., don't specify 2 processors per node for a single-processor job. This is because the scheduler software simultaneously weighs all job submissions against the current and future node availability.

    • Second, be even more careful not to request less resources than your jobs require. The job scheduler will automatically kill most jobs which exceed the resources requested in your PBS script. There is further important information on this in our FAQ and job scheduler policies.

  4. Please note that PBS commands and their attributes are case sensitive.

    A very simple PBS script (named, "submission_script.pbs") is shown below. The statements beginning with "#" are simply comments; statements beginning with "#PBS" are PBS commands and must precede the program that you actually want to execute on the cluster.

    The first order of business is to define your shell environment. Therefore, the "!/bin" command must be the first line in your script file.

    #!/bin/sh
    # Beginning of PBS batch script.
    #PBS -M my.address@vanderbilt.edu
    # Status/Progress EMails sent to "my.address@vanderbilt.edu"
    #PBS -m bae
    # Email generated at b)eginning, a)bort, and e)nd of jobs
    #PBS -l nodes=4:ppn=2:x86
    # Nodes required (#nodes:#processors per node:CPU type)
    #PBS -l mem=2000mb
    # Total job memory required (specify how many megabytes)
    #PBS -l pmem=250mb
    # Memory required per processor (specify how many megabytes)
    #PBS -l walltime=00:05:00
    # You must specify Wall Clock time (hh:mm:ss) [Maximum allowed 30 days = 720:00:00]
    #PBS -o myjob.output
    # Send job stdout to file "myjob.output"
    #PBS -j oe
    # Send (join) both stderr and stdout to "myjob.output"
    echo "This is my first job submitted on the ACCRE cluster."
    # Replace the above echo command with your executable program
    # End of PBS batch script.

    Commonly used PBS commands are illustrated in the "submission_script.pbs" example file.

    1. PBS mail commands to first define your e-mail address (#PBS -M) and secondly to send you an e-mail message (#PBS -m) when your job begins ('b'), aborts ('a'), and ends ('e'). Note that the attribute to define your e-mail address is an upper-case 'M'; the attribute for defining when a message is sent to you is a lower-case 'm'.

    2. PBS commands to specify the resources you will need (#PBS -l):

      For more details about requesting specific types of nodes and efficiently matching your requests to your application needs, please see the FAQ.

      There is also information in the slide presentations of our Introduction to the Compute Cluster and Job Scheduler Details classes you may find useful.

      • nodes=#

        where # is a number of requested nodes. For example:

        #PBS -l nodes=4

      • ppn=#

        Each node has either 2 or 4 CPUs. You may specify multiple CPUs per node for better parallel efficiency. For example, to use 2 CPUs on each of 4 nodes:

        #PBS -l nodes=4:ppn=2

        This PBS statement requests a total of 8 CPUs. Note: when you specify ppn=2 it may increase your wait time since any dual processor nodes or blades with one processor already in use cannot fulfill your request.

        If you want to use 8 processors but you do not care if the CPUs are scattered across multiple nodes, simply leave the "ppn=#" option off of your script:

        #PBS -l nodes=8

        This PBS statement assumes that you need 8 processors. Some of the CPUs used for the execution of your job may be paired on the same node or they may all be scattered across 8 separate nodes.

      • x86 or ppc64

        Three types of processors ( Opteron, PowerPC) are currently available in the cluster.

        Note: Programs compiled on the Opteron processors (X86 Intel processors) will not run on the PowerPC processors. You must recompile your code on the PowerPCs prior to submitting the compiled code for execution on PowerPCs.

        There is no default CPU processor type. To insure that your jobs execute, you need to specify the type of node you want to use, "ppc64" (PowerPC), "x86" (Opteron) or "opteron" (Opteron only). If you use the "x86" option, it is likely that your job will be run on a mixture of Opteron processors. In the "submission_script.pbs" example shown above, X86 processors were requested.

      • Specifying memory requirements with mem=#mb and pmem=#mb

        Use "mem" and "pmem" to set memory requirements for your job.

        "pmem" is the per CPU memory requirements, while "mem" is the total job memory requirements. These will be the same for a single CPU job. We recommend getting into the habit of specifying both, since it's really easy to decide to run a multi-CPU job and then forget to specify "mem". For example, to allocate 500mb to each processor of a 2 processor job:

        #PBS -l pmem=500mb
        #PBS -l mem=1000mb

        Some further points to bear in mind:

        • If you do not set "mem" and "pmen", the default allocation is 400 MB.

        • Specifying much more memory than your job requires may delay its start time if the requested resources are not immediately available. You should therefore tune these parameters to be close to what your job needs.

        • You cannot request the maximum memory on any node (If you do, your job will sits permanently in the idle queue). This is because any machine uses some memory to run the system OS. Therefore, as a rule of thumb, leave a buffer of at least 200MB in your request. E. g., the bigmem nodes with 4GB, have approximately 3.8GB available for computation.

      • Specifying wall time requirements with walltime=hh:mm:ss

        walltime must be specified. When you first run a job it helps to be conservative, estimating a walltime longer than what you actually expect. This is because if your job exceeds the walltime, the scheduler will kill it. Once you know how long your jobs will actually run, you can decrease this PBS specification so as to lessen the wait time in the idle queue.

        Jobs which take less than 30 minutes are an inefficient use of cluster resources. One problem is that the scheduler prescribes huge priority values to extremely short jobs and there is some lag time between jobs ending and new jobs being sent to free CPUs. Therefore, there is a danger that large numbers of very short jobs could continually fill up the cluster and result in unreasonably large wait times for longer jobs.

        A minimal number of short test jobs is acceptable. However, if you expect to run many short jobs, we ask that you lengthen the overall walltime to about an hour or more by writing many of your executables into a single PBS script, like so:

        #!/bin/sh
        #PBS specifications
        ...your script for job 1 here
        ...your script for job 2 here
        ...your script for job 3 here
        ...your script for job 4 here
        ...your script for job 5 here
        ...etc
        #end PBS script

        See also our job scheduler policies and these FAQ on walltimes.

      • Specifying CPU time requirements with cput=hh:mm:ss

        CPU time is the amount of time your job uses CPU power. If the code you're running uses CPU constantly, cput = nodes*ppn*walltime. For code which spends a lot of time doing I/O, the CPU time can be much less than the wall time. Unlike wallime, specifiying cput is not required, so if you have no specific reason to use it, don't request it.

    3. Commonly used PBS job output commands (#PBS -j).

      • #PBS -o filename

        For example:

        #PBS -o myjob.output

        Specifies that the standard job output should be placed in the file name, 'myjob.output'.

      • #PBS -j oe

        Specifies that the standard error for the PBS script file should be merged with the standard output file.

  5. Submit your job, i. e., PBS script. At the gateway machine command line, type:

    qsub submission_script.pbs

    Your job has now been submitted to PBS and you have received a job number (e.g., 28671.vmpsched). "28671" is the number of your job in the scheduler. If requested in the submission script, PBS will notify you via e-mail when your program begins/finishes running (see above).

  6. Default PBS script output.

    If you do not include PBS output specifications, by default, after your job executes you will see two new files in your home directory. Each will have the prefix of your job name, followed by either .e<jobid> or .o<jobid> (where <jobid> is the number of your job). The .e file contains any job execution errors written to STDERR; the '.o' file contains output written to STDOUT. As noted above, you may combine the error and output into a single file by including:

    #PBS -j oe

    in your PBS script file.

  7. Monitor your jobs.

    Please continue to two simple Unix job scripts and output.