Quick Start Guide
Overall Pipeline to Manage Job with HTCondor
To users, HTCondor is a job scheduler. You give HTCondor a file containing commands that tell it how to run jobs. HTCondor locates a machine that can run each job within the pool of machines, packages up the job and ships it off to this execute machine. The jobs run, and output is returned to the machine that submitted the jobs.
Pipeline Example
In this paragraph, an example will be provided to show more details of the pipeline to manage job with HTCondor.
In the beginning, we are going to run the traditional ‘hello world’ program. In order to demonstrate the distributed resource nature, we will produce a Hello DLS
message 3 times, where each time is its own job. Since you are not directly invoking the execution of each job, you need to tell HTCondor how to run the jobs for you. The information needed is placed into a submit file ****(e.g. hello-DLS.sub
), which defines variables that describe the set of jobs.
Copy the text below, and paste it into a file called
hello-DLS.sub
, the submit file, in your home directory on DLS:Now, create the executable that we specified above: copy the text below and paste it into a file called
hello-DLS.sh
:When HTCondor runs this executable, it will pass the $(Process) value for each job and
hello-DLS.sh
will insert that value for “$1” above.Now, submit your job to the queue using
condor_submit
:The**
condor_submit
command actually submits your jobs to HTCondor. If all goes well, you will see output from thecondor_submit
**command that appears as:To check on the status of your jobs, run**
condor_q
**command. The output of **condor_q
**should look like this:You can run the**
condor_q
**command periodically to see the progress of your jobs. By default, **condor_q
shows jobs grouped into batches by batch name (if provided), or executable name. To show all of your jobs on individual lines, add the-nobatch
**option.If your job is not in**
RUN
status for a long time (either inIDLE
or inHOLD
**status), run **condor_q –analyze JOB_ID
**to get more details of the status of your job.Jobs that failed to be run will always in the queue unless you delete them from the queue. To delete a job, run **
condor_rm JOB_ID
**command.When your jobs complete after a few minutes, they’ll leave the queue. If you do a listing of your home directory with the command**
ls -l
**, you should see something like:Useful information is provided in the user log and the output files.
HTCondor creates a transaction log of everything that happens to your jobs. Looking at the log file is very useful for debugging problems that may arise.
Last updated