About job migration

Job migration refers to the process of moving a checkpointable or rerunnable job from one host to another. This facilitates load balancing by moving jobs from a heavily-loaded host to a lightly-loaded host.

You can initiate job migration on demand (bmig) or automatically. To initiate job migration automatically, you configure a migration threshold at job submission, or at the host, queue, or application level.

Default behavior (job migration not enabled)

With automatic job migration enabled

Scope

Applicability

Details

Operating system

  • UNIX

  • Linux

  • Windows

Job types

  • Non-interactive batch jobs submitted with bsub or bmod, including chunk jobs

Dependencies

  • UNIX and Windows user accounts must be valid on all hosts in the cluster, or the correct type of account mapping must be enabled:
    • For a mixed UNIX/Windows cluster, UNIX/Windows user account mapping must be enabled

    • For a cluster with a non-uniform user name space, between-host account mapping must be enabled

    • For a MultiCluster environment with a non-uniform user name space, cross-cluster user account mapping must be enabled

  • Both the original and the new hosts must:
    • Be binary compatible

    • Run the same dot version of the operating system for predictable results

    • Have network connectivity and read/execute permissions to the checkpoint and restart executables (in LSF_SERVERDIR by default)

    • Have network connectivity and read/write permissions to the checkpoint directory and the checkpoint file

    • Have access to all files open during job execution so that LSF can locate them using an absolute path name