jm.py¶

jm.py - command-line interface to job_manager

Synopsis¶

jm.py add [-c | --cache] [-s | --server] <job_description>

jm.py modify [-c | --cache] [-s | --server] [-i | --index] [-p | --pattern] <job_description>

jm.py delete [-c | --cache] [-s | --server] [-i | --index] [-p | --pattern]

jm.py list [-c | --cache] [-s | --server] [-p | --pattern] [-t | --terse]

jm.py merge [-c | --cache] <[[user@]remote_host:]remote_cache> [remote_hostname]

jm.py update [-c | --cache]  

jm.py daemon [-c | --cache]

Description¶

jm.py provides a command-line interface to the job_manager python module, which enables a collection of jobs (principally long-running high-performance computing calculations) to be easily monitored and managed.

Jobs can be added, modified, deleted and subsets of the jobs can be viewed. Jobs are categorised according to the computer on which they are run. The local computer on which jm.py is run is treated specially and is called localhost.

Data is saved to a cache file between runs and different cache files can be merged together, including caches on remote servers directly over ssh.

Commands¶

add: Add a job running on the specified server with job details given by job description.
modify: Modify the selected job(s) according to the job description fields supplied. Note that if neither a pattern nor an index is provided then no job is selected to be modified.
delete: Delete the specified jobs. Note that if neither a pattern nor an index is provided then no job is selected to be modified.
list: List jobs which match the supplied search criteria. The complete list of jobs is printed out if no options are specified. Only fields of the job description which are not null are printed out.
merge: Merge jobs from the remote_cache file into the current cache. The remote hostname nickname must be specified if the remote cache is actually a local file. If remote_hostname is not given and the remote_cache is on a remote machine, then the hostname in the address is used as the remote_hostname parameter.
update: Check all jobs on the localhost server and update the status of queueing or running jobs if they have started running or finished. The job status is checked by searching for the job_id using ps, qstat (for PBS-based queueing systems) and llq (for LoadLeveler queueing systems).
daemon: Run the update command once a minute. Designed to be run in the background as a daemon-type process.

Job description¶

The job_description consists of a list of key-value pairs. A new pair is started by a new key, so each value can contain spaces. The keys must terminate with a colon (‘:’) and have a space between the end of the key and the first word in the value. See below for examples.

Available elements of the job description are:

job_id: ID of the job. This ought to be unique and in order to work with the update and daemon commands should identiy the job by either being the pid of the job (for jobs running interactively) or the ID of the job in the queueing system. This value is most conveniently obtained from the environment or a queueing system environment variable.
path: Path to the directory in which the job is running.
program: Name of the program being executed.
input_fname: Filename of the input file.
output_fname: Filename of the output file.
status: status of job. Available values are: unknown, held, queueing, running, finished and analysed. Default: unknown.
submit: File name of the submit script used. Only relevant for jobs run on clusters with queueing systems.
comment: Comment and notes on the job.

Unless specified above, all elements default to being a null value.

A job must have a job_id, path and program specified. Other attributes are optional. Only the attributes to be set or modified need to specified with the add and modify commands.

Options¶

`-c, --cache`	Specify the location of the cache file containing data from previous runs. The default is $HOME/.cache/jm/jm.cache. The directory structure for the cache file will be created if necessary.
`-s, --server`	Specify the server of the job. The default is the localhost server except for the list command, where the default is all servers. Can be specified multiple times, in which case the command is applied to each server in turn. However, this rarely makes sense for the add command.
`-i, --index`	Select a job by its index on the specified server(s). Can be specified multiple times in order to select multiple jobs.
`-p, --pattern`	Select a job by a given regular expression on the specified server(s). The regular expression is tested against all fields in the job description for each job and a job is selected if any of the fields match the regular expression.
`-t, --terse`	Print only the hostname, index, job id and status of each job.

Examples¶

Create a job from inside a script. $$ is the current process id in bash.

$ jm.py add job_id: $$ path: $PWD status: running

List all jobs.

$ jm.py list

Modify part of the job description.

$ jm.py modify --index 0 comment: a test calculation

Automatically update the status of running jobs

$ jm.py update

Run a daemon process to automatically update the status of running jobs once a minute using a non-default cache file.

$ jm.py daemon --cache /path/to/cache

Merge jobs from a remote server into the local job cache:

$ jm.py merge user@remote_server_fqdn:/path/to/remote_cache remote_server_name

Note

The remote file is transferred by scp and requires password-free access to the remote server (e.g. by using ssh keys and ssh-agent). If this is not possible, copy the remote cache to the local machine and then merge using the local copy.

List a subset of jobs.

$ jm.py list --server remote_server
$ jm.py list --server localhost

Delete a job on the remote server.

$ jm.py delete --server remote --index 0

License¶

The jm.py script and the job_manager python module are distributed under the Modified BSD License. Please see the source files for more information.

Bugs¶

Contact James Spencer (j.spencer@imperial.ac.uk) regarding bug reports, suggestions for improvements or code contributions.