How resizable jobs works with other LSF features

Resource usage

When a job grows or shrinks, its resource reservation (for example memory or shared resources) changes proportionately.

  • Job-based resource usage does not change in grow or shrink operations.

  • Host-based resource usage changes only when the job gains slots on a new host or releases all slots on a host.

  • Slot-based resource usage changes whenever the job grows or shrinks.

Limits

Slots are only added to a job’s allocation when resize occurs if the job does not violate any resource limits placed on it.

Job scheduling and dispatch

The JOB_ACCEPT_INTERVAL parameter in lsb.params or lsb.queues controls the number of seconds to wait after dispatching a job to a host before dispatching a second job to the same host. The parameter applies to all allocated hosts of a parallel job. For resizable job allocation requests, JOB_ACCEPT_INTERVAL applies to newly allocated hosts.

Chunk jobs
Because candidate jobs for the chunk job feature are short-running sequential jobs, the resizable job feature does not support job chunking:
  • Autoresizable jobs in a chunk queue or application profile cannot be chunked together

  • bresize commands to resize job allocations do not apply to running chunk job members

brequeue

Jobs requeued with brequeue start from the beginning. After requeue, LSF restores the original allocation request for the job.

blaunch

Parallel tasks running through blaunch can be resizable.

bswitch

bswitch can switch resizable jobs between queues regardless of job state (including job’s resizing state). Once the job is switched, the parameters in new queue apply, including threshold configuration, run limit, CPU limit, queue-level resource requirements, etc.

User group administrators

User group administrators are allowed to issue bresize commands to release a part of resources from job allocation (bresize release) or cancel active pending resize request (bresize cancel).

Requeue exit values

If job-level, application-level or queue-level REQUEUE_EXIT_VALUES are defined, and as long as job exits with a defined exit code, LSF puts the requeued job back to PEND status. For resizable jobs, LSF schedules the job according to the initial allocation request regardless of any job allocation size change.

Automatic job rerun

A rerunnable job is rescheduled after the first running host becomes unreachable. Once job is rerun, LSF schedules resizable jobs that are based on their initial allocation request.

Compute units

Autoresizable jobs cannot have compute unit requirements.

Compound resource requirements

Resizable jobs cannot have compound resource requirements.