To optimize resource utilization, LSF allows job allocation to shrink and grow during the job run time.
Use resizable jobs for long-tailed jobs, jobs that use a large number of processors for a period, but then toward the end of the job use a smaller number of processors.
Without resizable jobs, a job’s slot allocation is static from the time the job is dispatched until it finishes. This means resources are wasted, even if you use reservation and backfill (estimated runtimes can be inaccurate). With resizable jobs, jobs can have additional slots added when needed, during the job’s runtime.
An autoresizable job is a resizable job with a minimum and maximum slot request, where LSF automatically schedules and allocates additional resources to satisfy the job maximum request as the job runs.
Use autoresizable jobs for jobs in which tasks are easily parallelized: Each step or task can be made to run on a separate processor to achieve a faster result. The more resources the job gets, the faster the job can run. Session Scheduler jobs are very good candidates.
For autoresizable jobs, LSF automatically recalculates the pending allocation requests. The maximum pending allocation request is calculated based on the maximum number of requested slots minus the number of allocated slots. Because the job is running and its previous minimum request is already satisfied, LSF is able to allocate additional slots to the running job. For instance, if job requests a minimum of 4 and a maximum of 32, if LSF allocates 20 slots to the job initially, its active pending allocation request is for another 12 slots. After LSF assigns another 4 slots, the pending allocation request is now 8 slots.
A pending allocation request is an additional resource request attached to a resizable job. Only running jobs can have pending allocation requests. At any given time, a job only has one allocation request.
LSF creates a new pending allocation request and schedules it after a job physically starts on the remote host (after LSF receives the JOB_EXECUTE event from sbatchd) or resize notification command successfully completes.
A resize notification command is an executable that is invoked on the first execution host of a job in response to an allocation (grow or shrink) event. It can be used to inform the running application for allocation change. Due to the variety of implementations of applications, each resizable application may have its own notification command provided by the application developer.
The notification command runs under the same user ID environment, home, and working directory as the actual job. The standard input, output, and error of the program are redirected to the NULL device. If the notification command is not in the user's normal execution path (the $PATH variable), the full path name of the command must be specified.
A notification command exits with one of the following values:
LSB_RESIZE_NOTIFY_OK=0
LSB_RESIZE_NOTIFY_FAIL=1
LSF sets these environment variables in the notification command environment. LSB_RESIZE_NOTIFY_OK indicates that notification succeeds. For allocation grow and shrink events, LSF updates the job allocation to reflect the new allocation.
LSB_RESIZE_NOTIFY_FAIL indicates notification failure. For allocation "grow" event, LSF reschedules the pending allocation request. For allocation "shrink" event, LSF fails the allocation release request.
For a list of other environment variables that apply to the resize notification command, see the environment variables reference documentation in this guide.