LSF AE commands

The single cluster view maintained by LSF AE gives end users a single-cluster experience using existing Platform LSF commands. Additional command options allow administrators to track job progress through to the execution clusters.

For the output examples shown, consider a submission cluster with two execution clusters:

Submission cluster: sub_cluster

Master host: hostA

Slave hosts: no.
First execution cluster: cluster2

Master host: hostB

Slave hosts: no
Second execution cluster: cluster3

Master host: hostC

Slave hosts: hostD

lsclusters

Displays information and current status for the submission cluster and all configured execution clusters.

This command runs locally (in the execution cluster) when submitted by a running job.

% lsclusters
CLUSTER_NAME STATUS MASTER_HOST ADMIN    HOSTS SERVERS
sub_cluster   ok     hostA       lsfadmin 1     1
cluster2     ok     hostB       lsfadmin 1     1
cluster3     ok     hostC       lsfadmin 2     2

lshosts

Shows host information for all hosts within all clusters connected by LSF AE.

This command runs locally (in the execution cluster) when submitted by a running job.

% lshosts
HOST_NAME type   model    cpuf  ncpus maxmem maxswp server RESOURCES
hostA     X86_64 PC6000   116.1 2     1000M  1983M  Yes    (mg)
hostB     X86_64 Opteron2 47.0  1     1000M  1961M  Yes    (mg)
hostC     X86_64 Intel_EM 60.0  2     7978M  1992M  Yes    (mg)
hostD     X86_64 Intel_EM 60.0  2     3828M  2055M  Yes    (mg)

-cname

New option -cname includes the cluster name for execution cluster hosts and host groups in output for lshosts.

% lshosts -cname
HOST_NAME      type   model    cpuf  ncpus maxmem maxswp server RESOURCES
hostA          X86_64 PC6000   116.1 2     1000M  1983M  Yes    (mg)
hostB@cluster2 X86_64 Opteron2 47.0  1     1000M  1961M  Yes    (mg)
hostC@cluster3 X86_64 Intel_EM 60.0  2     7978M  1992M  Yes    (mg)
hostD@cluster3 X86_64 Intel_EM 60.0  2     3828M  2055M  Yes    (mg)

lsload

Shows host load information for all hosts within all clusters connected by LSF AE.

This command runs locally (in the execution cluster) when submitted by a running job.

% lsload
HOST_NAME status r15s r1m r15m ut  pg  ls it tmp   swp   mem
hostA     ok     0.2  0.0 0.0  1%  0.0 6  4  22G   1948M 725M
hostB     ok     0.6  0.0 0.0  0%  0.0 2  0  18M   1554M 518M
hostC     ok     0.4  0.1 0.0  7%  0.0 3  0  3270M 1990M 5448M
hostD     ok     1.0  1.0 1.1  13% 0.0 8  20 21G   353M  491M

-cname

New option -cname includes the cluster name for execution cluster hosts and host groups in output for lsload.

% lsload -cname   
HOST_NAME      status r15s r1m r15m ut  pg  ls it tmp   swp   mem
hostA          ok     0.2  0.0 0.0  1%  0.0 6  4  22G   1948M 725M
hostB@cluster2 ok     0.6  0.0 0.0  0%  0.0 2  0  18M   1554M 518M
hostC@cluster3 ok     0.4  0.1 0.0  7%  0.0 3  0  3270M 1990M 5448M
hostD@cluster3 ok     1.0  1.0 1.1  13% 0.0 8  20 21G   353M  491M

bacct

Displays accounting statistics for finished jobs within all clusters connected by LSF AE.

This command runs locally (in the execution cluster) when submitted by a running job.

badmin diagnose/hopen/hclose

Forwards requests made through the submission cluster to the appropriate remote host or cluster. Not supported within the local execution cluster.

badmin mbdrestart -p

Starts mbatchd in parallel, leaving the existing mbatchd free to respond to queries and run commands while the new mbatchd restarts, reading configuration files and replaying events. Once all events have been read mbatchd daemons merge, replaying new events and leaving only the new mbatchd running.

During a parallel mbatchd restart, new badmin mbdrestart and badmin reconfig commands are not accepted. Parallel mbatchd restart using badmin mbdrestart -p does not work with duplicate event logging (LSB_LOCALDIR in lsf.conf).

The existing command badmin mbdrestart remains unchanged.

badmin showstatus

In the submission cluster, displays a summary of the current LSF runtime information about the submission cluster and all execution clusters. In the execution cluster, displays a summary of the current LSF runtime information about the local execution cluster only.

The current LSF runtime information displayed includes information about hosts, jobs, users, user groups, and mbatchd startup and reconfiguration.

bbot

Sends the job back to the submission cluster for rescheduling, at the bottom of the queue.

Run bbot commands through the submission cluster, not through individual execution clusters.

bclusters

Displays execution cluster resource provider and consumer information, resource flow information, and connection status between the submission cluster and execution cluster.

Use -app to view available application profiles in remote clusters.

Information related to LSF AE is displayed under the heading Job Forwarding Information.

LOCAL_QUEUE: Name of a LSF AE queue.
JOB_FLOW: Indicates direction of job flow.
- send
  
  The local queue is a submission cluster send-jobs queue (SNDJOBS_TO is defined in the local queue).
- recv
  
  The local queue is an execution cluster receive-jobs queue (RCVJOBS_FROM is defined in the local queue).
REMOTE: Shows the name of the remote queue, always the same as the local queue name.

For receive-jobs queues, always “-”.
CLUSTER: For send-jobs queues, shows the name of the execution cluster containing the receive-jobs queue.

For receive-jobs queues, shows the name of the submission cluster that can send jobs to the local queue.
STATUS: Indicates the connection status between the local queue and remote queue.
- ok
  
  The submission cluster and execution cluster can exchange information and the system is properly configured.
- disc
  
  Communication between the two clusters has not been established. This could occur because there are no jobs waiting to be dispatched, or because the remote master cannot be located.
- reject
  
  The remote queue rejects jobs from the send-jobs queue. The local queue and remote queue are connected and the clusters communicate, but the queue-level configuration is not correct.

% bclusters
[Job Forwarding Information ]
LOCAL_QUEUE JOB_FLOW REMOTE CLUSTER  STATUS
queue1      send     queue1 cluster2 ok
queue1      send     queue1 cluster3 ok
[Resource Lease Information ]
No resources have been exported or borrowed

bconf

Submits live reconfiguration requests, updating configuration settings in active memory without restarting daemons.

action object_type=identity ["key-value_pair[;key-value_pair...]"] [-c "comments"] [-f]

The limit object_type has enhanced support for the following LSF AE key-value_pair keywords in lsb.resources for generic job forwarding limits:

CLUSTERS for the following action types: addmember, rmmember, update, create, delete
PER_CLUSTER for the following action types: addmember, rmmember, update, create, delete
FWD_SLOTS for the following action types: update, create, delete

Example:

bconf update limit=cluster_e1 "FWD_SLOTS=10"

bhist

Displays historical information for all jobs within all clusters connected by LSF AE.

This command runs locally (in the execution cluster) when submitted by a running job.

bhosts/bmgroup

Displays information for all hosts within all clusters connected by LSF AE. This includes the submission cluster and execution clusters. The same format is used for all hosts, both in the submission cluster and in the execution clusters. The output displayed is sorted by host or host group name.

Execution cluster hosts and host groups can be identified by name only (host_name), or by name and cluster. (host_name@cluster_name). If there are multiple hosts or host groups with the same name, all are displayed.

% bhosts
HOST_NAME STATUS JL/U MAX NJOBS RUN SSUSP USUSP RSV
hostA     ok     -    2   0     0   0     0     0
hostB     ok     -    1   0     0   0     0     0
hostC     ok     -    2   0     0   0     0     0
hostD     ok     -    2   0     0   0     0     0

% bmgroup -l
GROUP_NAME    CONDENSE   HOSTS
hgroup1       Yes        hostB
hgroup2       No         hostC hostD

-cname

Includes the cluster name for execution cluster hosts and host groups in output for bhosts and bmgroup. The output displayed is sorted by cluster and then by host or host group name.

% bhosts -cname
HOST_NAME      STATUS JL/U MAX NJOBS RUN SSUSP USUSP RSV
hostA          ok     -    2   0     0   0     0     0
hostB@cluster2 ok     -    1   0     0   0     0     0
hostC@cluster3 ok     -    2   0     0   0     0     0
hostD@cluster3 ok     -    2   0     0   0     0     0

% bmgroup -l -cname
GROUP_NAME           CONDENSE   HOSTS
hgroup1              Yes        hostB
hgroup2@Cluster3     No         hostC hostD

bjobs

Displays job status, adds job forward information and supports the -fwd filter option.

% bjobs 
JOBID USER    STAT QUEUE  FROM_HOST EXEC_HOST JOB_NAME   SUBMIT_TIME
1     lsfuser RUN  queue1 hostA     hostC     sleep 1234 Nov 29  1 14:08

-l

Displays forwarding time and cluster name for forwarded pending and running jobs.

% bjobs -l
Job <1>, User <lsfuser>, Project <default>, Status <RUN>, Queue <queue1>, 
Command <sleep 1234>
Mon Nov 29 14:08:35: Submitted from host <hostA>, CWD </home/lsfuser >, 
                     Re-runnable;
Mon Nov 29 14:08:38: Job <1> forwarded to cluster <cluster3> as Job <1>;
Mon Nov 29 14:08:44: Started on <hostC>, Execution Home </home/lsfuser>, 
                     Execution CWD </home/lsfuser>;
Mon Nov 29 14:08:46: Resource usage collected.
 MEM: 2 Mbytes;  SWAP: 32 Mbytes;  NTHREAD: 1
 PGID: 6395;  PIDs: 6395
 
SCHEDULING PARAMETERS:
           r15s r1m  r15m ut pg io ls it tmp swp mem
 loadSched -    -    -    -  -  -  -  -   -  -   -
 loadStop  -    -    -    -  -  -  -  -   -  -   -

-cname

Includes the cluster name for execution cluster hosts in output for bjobs -l.

% bjobs -l -cname
Job <1>, User <lsfuser>, Project <default>, Status <RUN>, Queue <queue1>, 
Command <sleep 1234>
Mon Nov 29 14:08:35: Submitted from host <hostA>, CWD </home/lsfuser >, 
                     Re-runnable;
Mon Nov 29 14:08:38: Job <1> forwarded to cluster <cluster3> as Job <1>;
Mon Nov 29 14:08:44: Started on <hostC@cluster3>, Execution Home 
                     </home/lsfuser>, Execution CWD </home/lsfuser>;
Mon Nov 29 14:08:46: Resource usage collected.
 MEM: 2 Mbytes;  SWAP: 32 Mbytes;  NTHREAD: 1
 PGID: 6395;  PIDs: 6395
 
SCHEDULING PARAMETERS:
           r15s r1m  r15m ut pg io ls it tmp swp mem
 loadSched -    -    -    -  -  -  -  -   -  -   -
 loadStop  -    -    -    -  -  -  -  -   -  -   -

-sum

Displays summary information about unfinished jobs. bjobs -sum displays the count of job slots in the following states: running (RUN), system suspended (SSUSP), user suspended (USUSP), pending (PEND), forwarded to remote clusters and pending (FWD_PEND), and UNKNOWN.

bjobs -sum displays the job slot count only for the user’s own jobs.

% bjobs -sum
RUN        SSUSP       USUSP      UNKNOWN    PEND       FWD_PEND
123        456         789        5          5          3

Use -sum with other options (like -m, -P, -q, and -u) to filter the results. For example, bjobs -sum -u user1 displays job slot counts just for user user1.

% bjobs -sum -u user1
RUN        SSUSP       USUSP      UNKNOWN    PEND       FWD_PEND
20         10          10         0           5          0

blimits

Displays cluster limit data across all clusters, including forward limits and aggregated execution cluster limits. All clusters are shown by default.

In order to specify a remote limit with the -n option, include the cluster name as well as the limit name: blimits -n limit_name@cluster_name.

% blimits
Sub-master: blimits
 
FORWARD LIMITS:
 
    NAME          USERS            QUEUES          PROJECTS         CLUSTERS        FWD_SLOTS     
 NONAME000          -                -              proj1       ec1 ec2       1/5        
 NONAME001          -             queue1              -         ec1 ec2       2/7        
 NONAME002          -                -                -                -               2/10       
 
REMOTE CLUSTER <ec1>:
 
INTERNAL RESOURCE LIMITS:
 
    NAME        USERS          QUEUES         HOSTS         PROJECTS    SLOTS   MEM    TMP    SWP    JOBS
 NONAME000         -           queue1         all             -         2/10    -      -      -      - 
 
REMOTE CLUSTER <ec2>:
 
No resource usage found.

-fwd

Displays forward slot allocation limits.

Use -fwd with -c to display forward slot limit configuration.

-fwd -C cluster_name...

Displays forward slot allocation limits for one or more specific clusters. -C cannot be used without -fwd.

Use -fwd -C with -c to display forward slot limit configuration for the specified cluster.

bmod

Modifies job submission options. Forwarded jobs can only be modified within one execution cluster. Changes across execution clusters are not supported.

Run bmod commands through the submission cluster, not through individual execution clusters.

bparams

Displays new LSF AE parameters.

bqueues

Displays status of queues in the cluster, including all running and pending jobs counters, and fairshare information.

Enhanced output for LSF AE displays information about forwarded pending jobs for each queue in the SHARE_INFO_FOR section under the heading FWD_PEND.

% bqueues -lr queue1
 
QUEUE: queue1
 -- No description provided.
 
PARAMETERS/STATISTICS
PRIO NICE STATUS          MAX JL/U JL/P JL/H NJOBS  PEND   RUN SSUSP USUSP  RSV
 40   10  Open:Active       -    -    -    -     1     0     1     0     0    0
Interval for a host to accept two jobs is 0 seconds
 
SCHEDULING PARAMETERS:
           r15s r1m  r15m ut pg io ls it tmp swp mem
 loadSched -    -    -    -  -  -  -  -   -  -   -
 loadStop  -    -    -    -  -  -  -  -   -  -   -
 
SCHEDULING POLICIES:  FAIRSHARE  ABS_CLUSTER_PREFERENCE
USER_SHARES:  [default, 1]
 
SHARE_INFO_FOR: queue1/
 USER/GROUP   SHARES  PRIORITY  STARTED  RESERVED  CPU_TIME  RUN_TIME  FWD_PEND
lsfuser          1       0.167      1        0         0.0        0        0
 
USERS: all
HOSTS:  none
REQUEUE_EXIT_VALUES:  1
SEND_JOBS_TO:  queue1@cluster2, queue1@cluster3
RERUNNABLE :  yes
 
MAX_RSCHED_TIME: 360 10

brequeue

Requeues a job.

By default, when a job running on an execution cluster is requeued it returns to the submission cluster in the PEND state awaiting rescheduling.

Automatic job requeue using REQUEUE_EXIT_VALUES (in lsb.queues) returns a running job to the PEND state in the same execution cluster for local rescheduling.

% bhist -l 887
 
Job <887>, User <lsfadmin>, Project <default>, Command <sleep 10000>
Wed Dec  8 19:37:21: Submitted from host <hostD>, to Queue <test1>, CWD <$HOME>, 
                     Requested Resources <type==any>;
Wed Dec  8 19:37:25: Forwarded job to cluster cluster2;
Wed Dec  8 19:37:28: Dispatched to <hostB@cluster3>;
Wed Dec  8 19:37:28: Starting (Pid 22557);
Wed Dec  8 19:37:28: Running with execution home </home/lsfadmin>, Execution CWD 
                     </home/lsfadmin>, Execution Pid <22557>;
Wed Dec  8 19:37:29: Signal <REQUEUE_PEND> requested by user or administrator 
                     <lsfadmin>;
Wed Dec  8 19:37:30: Exited with exit code 130. The CPU time used is 0.1 seconds;
Wed Dec  8 19:38:31: Pending: Job has been requeued;
Wed Dec  8 19:38:31: Forwarded job to cluster cluster3;
Wed Dec  8 19:38:33: Dispatched to <hostD@cluster3>;
Wed Dec  8 19:38:33: Starting (Pid 24115);
Wed Dec  8 19:38:33: Running with execution home </home/lsfadmin>, Execution CWD 
                     </home/lsfadmin>, Execution Pid <24115>;
 
Summary of time in seconds spent in various states by  Wed Dec  8 19:39:24
  PEND     PSUSP    RUN      USUSP    SSUSP    UNKWN    TOTAL
  70       0        53       0        0        0        123

Automatic requeue example with REQUEUE_EXIT_VALUES defined:

% bhist -l 889
 
Job <889>, User <lsfadmin>, Project <default>, Command <sleep 5; exit 13>
Wed Dec  8 20:03:07: Submitted from host <hostD>, to Queue <test1>, CWD <$HOME>, 
                     Requested Resources <type==any>;
Wed Dec  8 20:03:13: Forwarded job to cluster ecluster2;
Wed Dec  8 20:03:19: Dispatched to <host3@ecluster2>;
Wed Dec  8 20:03:19: Starting (Pid 29850);
Wed Dec  8 20:03:19: Running with execution home </home/lsfadmin>, Execution CWD 
                     </home/lsfadmin>, Execution Pid <29850>;
Wed Dec  8 20:03:24: Pending: Job is requeued on the execution cluster due to exit 
                     value;
Wed Dec  8 20:03:29: Dispatched to <host3@ecluster2>;
Wed Dec  8 20:03:31: Starting (Pid 29866);
Wed Dec  8 20:03:31: Running with execution home </home/lsfadmin>, Execution CWD 
                     </home/lsfadmin>, Execution Pid <29866>;
Wed Dec  8 20:03:34: Pending: Job is requeued on the execution cluster due to exit 
                     value;

brlainfo

Displays host topology information for hosts within all clusters connected by LSF AE.

This command runs locally (in the execution cluster) when submitted by a running job.

brun

Forces a pending or finished job to run or be forwarded to a specified cluster. The exact behavior of brun on a pending job depends on where the job is pending, and which hosts or clusters are specified in the brun command.

Important:

Only administrators can use the brun command. You can only run brun from the submission cluster.

You must specify one or more host names or a cluster name when you force a job to run.

If multiple hosts are specified, the first available host is selected and the remainder ignored. Specified hosts cannot belong to more than one cluster.

You can only specify one cluster name. The job is forced to be forwarded to the specified cluster.

You cannot specify host names and cluster names together in the same brun command.

A job pending in an execution cluster forced to run in a different cluster is returned to the submission cluster, and then forwarded once again.

If a job is submitted with a cluster name and the job is forwarded to a remote cluster, you cannot use brun -m again to switch the job to another execution cluster. For example:

bsub -m cluster1 -q test1 sleep 1000

The job is pending on cluster1. Running brun again to forward the job to cluster2 is rejected:

brun -m cluster2 1803
Failed to run the job: Hosts requested do not belong to the cluster

For example:

brun -m "host12 host27"

In this example, if host12 is available the job is sent to the cluster containing host12 and tries to run. If unsuccessful, the job pends in the cluster containing host12. If host12 is not available, the job is sent to the cluster containing host27 where it runs or pends.

Force a job to run on a specific host

Local host specified

Job runs locally. For example:

brun -m hostA 246
Job <246> is being forced to run or forwarded.

bjobs 246
JOBID   USER    STAT  QUEUE      FROM_HOST   EXEC_HOST   JOB_NAME   SUBMIT_TIME
246     user1   RUN   normal     hostD       hostA       *eep 10000 Jan  3 12:15

bhist -l 246
Job <246>, User <user1>, Project <default>, Command <sleep 10000>
Mon Jan  3 12:15:22: Submitted from host <hostD>, to Queue <normal>, CWD 
                     <$HOME/envs>, Requested Resources <type == any>;
Mon Jan  3 12:16:13: Job is forced to run or forwarded by user or administrator 
                     <user1>;
Mon Jan  3 12:16:13: Dispatched to <hostA>;
Mon Jan  3 12:16:41: Starting (Pid 10467);
Mon Jan  3 12:16:59: Running with execution home </home/user1>, Execution CWD 
                     </home/user1/envs>, Execution Pid <10467>;

Host in execution cluster specified

Job is forwarded to execution cluster containing specified host, and runs.

For example:

brun -m hostB 244
Job <244> is being forced to run or forwarded.

bjobs 244
JOBID   USER    STAT  QUEUE      FROM_HOST   EXEC_HOST   JOB_NAME   SUBMIT_TIME
244     user1   RUN   normal     hostD       hostB       *eep 10000 Jan  3 12:15

bhist -l 244
 
Job <244>, User <user1>, Project <default>, Command <sleep 10000>
Mon Jan  3 12:15:22: Submitted from host <hostD>, to Queue <normal>, CWD 
                     <$HOME/envs>, Requested Resources <type == any>;
Mon Jan  3 12:19:18: Job is forced to run or forwarded by user or administrator 
                     <user1>;
Mon Jan  3 12:19:18: Forwarded job to cluster cluster2;
Mon Jan  3 12:19:18: Remote job control initiated;
Mon Jan  3 12:19:18: Dispatched to <hostB>;
Mon Jan  3 12:19:18: Remote job control completed;
Mon Jan  3 12:19:19: Starting (Pid 28804);
Mon Jan  3 12:19:19: Running with execution home </home/user1>, Execution CWD 
                     </home/user1/envs>, Execution Pid <28804>;

Host in same execution cluster specified

Job runs on the specified host in the same execution cluster. For example:

brun -m hostB 237
Job <237> is being forced to run or forwarded.

bjobs 237
JOBID   USER    STAT  QUEUE      FROM_HOST   EXEC_HOST   JOB_NAME   SUBMIT_TIME
237     user1   RUN   normal     hostD       hostB       *eep 10000 Jan  3 12:14
 
bhist -l 237
 
Job <237>, User <user1>, Project <default>, Command <sleep 10000>
Mon Jan  3 12:14:48: Submitted from host <hostD>, to Queue <normal>, CWD 
                     <$HOME/envs>, Requested Resources <type == any>;
Mon Jan  3 12:14:53: Forwarded job to cluster cluster2;
Mon Jan  3 12:22:08: Job is forced to run or forwarded by user or administrator 
                     <user1>;
Mon Jan  3 12:22:08: Remote job control initiated;
Mon Jan  3 12:22:08: Dispatched to <hostB>;
Mon Jan  3 12:22:09: Remote job control completed;
Mon Jan  3 12:22:09: Starting (Pid 0);
Mon Jan  3 12:22:09: Starting (Pid 29073);
Mon Jan  3 12:22:09: Running with execution home </home/user1>, Execution CWD 
                     </home/user1/envs>, Execution Pid <29073>;

Host in submission cluster specified

Job runs on the specified host in the submission cluster. For example:

brun -m hostA 238
Job <238> is being forced to run or forwarded.

bjobs 237
JOBID   USER    STAT  QUEUE      FROM_HOST   EXEC_HOST   JOB_NAME   SUBMIT_TIME
238     user1   RUN   normal     hostB       hostA       *eep 10000 Oct  5 11:00
 
bhist -l 237

Job <237>, User <user1>, Project <default>, Command <sleep 10000>
Wed Oct  5 11:00:16: Submitted from host <hostB>, to Queue <normal>, CWD 

                     </usr/local/xl/conf>, Requested Resources <type == any>;
Wed Oct  5 11:00:18: Forwarded job to cluster ec1;
Wed Oct  5 11:00:46: Job is forced to run or forwarded by user or administrator 
                     <user1>;
Wed Oct  5 11:00:46: Pending: Job has returned from remote cluster;
Wed Oct  5 11:00:46: Dispatched to <hostA>;
Wed Oct  5 11:00:46: Starting (Pid 15686);
Wed Oct  5 11:00:47: Running with execution home </home/user1>, Execution CWD 
                     </usr/local/xl/conf>, Execution Pid <15686>;

Summary of time in seconds spent in various states by  Wed Oct  5 11:01:06
  PEND     PSUSP    RUN      USUSP    SSUSP    UNKWN    TOTAL
  30       0        20       0        0        0        50

Force a job to run in a specific cluster

Host in different execution cluster specified

Job returns to submission cluster, is forwarded to execution cluster containing specified host, and runs.

brun -m ec2-hostA 3111
Job <3111> is being forced to run or forwarded.

bjobs 3111
JOBID   USER    STAT  QUEUE      FROM_HOST   EXEC_HOST   JOB_NAME   SUBMIT_TIME
3111    user1 RUN   queue1   sub-master ec2-hostA       sleep 1000 Feb 23 11:21
 
bhist -l 3111
 
Job <3111>, User <user1>, Project <default>, Command <sleep 1000>
Wed Feb 23 11:21:00: Submitted from host <sub-master>, to Queue <queue1>, CWD 
                     </usr/local/xl/conf>;
Wed Feb 23 11:21:03: Forwarded job to cluster cluster1;
Wed Feb 23 11:21:58: Job is forced to run or forwarded by user or administrator 
                     <user1>;
Wed Feb 23 11:21:58: Pending: Job has returned from remote cluster;
Wed Feb 23 11:21:58: Forwarded job to cluster cluster2;
Wed Feb 23 11:21:58: Remote job run control initiated;
Wed Feb 23 11:21:59: Dispatched to <ec2-hostA>;
Wed Feb 23 11:21:59: Remote job run control completed;
Wed Feb 23 11:21:59: Starting (Pid 3257);
Wed Feb 23 11:21:59: Running with execution home </home/user1>, Execution CWD 
                     </usr/local/xl/conf >, Execution Pid <3257>;
 
Summary of time in seconds spent in various states by  Wed Feb 23 11:24:59
  PEND     PSUSP    RUN      USUSP    SSUSP    UNKWN    TOTAL
  59       0        180      0        0        0        239

Job already forwarded to execution

Job has already been forwarded to an execution cluster, and you specify a different execution cluster. The job returns to submission cluster, and is forced to be forwarded to the specified execution cluster. The job is not forced to run in the new execution cluster. After the job is forwarded, the execution cluster schedules the job according to local policies.

For example:

brun -m cluster2 244
Job <244> is being forced to run or forwarded.

bjobs 244
JOBID   USER    STAT  QUEUE      FROM_HOST   EXEC_HOST   JOB_NAME   SUBMIT_TIME
244     user1   RUN   normal     hostD     hostB    *eep 10000 Jan  3 12:15

bhist -l 244
 
Job <244>, User <user1>, Project <default>, Command <sleep 10000>
Mon Jan  3 12:15:22: Submitted from host <hostD>, to Queue <normal>, CWD 
                     <$HOME/envs>, Requested Resources <type == any>;
Mon Jan  3 12:15:25: Forwarded job to cluster cluster1;
Mon Jan  3 12:19:18: Job is forced to run or forwarded by user or administrator 
                     <user1>;
Mon Jan  3 12:19:18: Pending: Job has returned from remote cluster;
Mon Jan  3 12:19:18: Forwarded job to cluster cluster2;
Mon Jan  3 12:19:18: Dispatched to <hostB>;
Mon Jan  3 12:19:19: Starting (Pid 28804);
Mon Jan  3 12:19:19: Running with execution home </home/user1>, Execution CWD 
                     </home/user1/envs>, Execution Pid <28804>;

Job pending in execution cluster

Job is forwarded to the specified execution cluster, but the job is not forced to run. After the job is forwarded, the execution cluster schedules the job according to local policies.

For example:

brun -m cluster2 244
Job <244> is being forced to run or forwarded.

bhist -l 244
 
Job <244>, User <user1>, Project <default>, Command <sleep 10000>
Mon Jan  3 12:15:22: Submitted from host <hostD>, to Queue <normal>, CWD 
                     <$HOME/envs>, Requested Resources <type == any>;
Mon Jan  3 12:19:18: Job is forced to run or forwarded by user or administrator 
                     <user1>;
Mon Jan  3 12:19:18: Forwarded job to cluster cluster2;
Mon Jan  3 12:19:18: Remote job control initiated;
Mon Jan  3 12:19:18: Dispatched to <hostB>;
Mon Jan  3 12:19:18: Remote job control completed;
Mon Jan  3 12:19:19: Starting (Pid 28804);
Mon Jan  3 12:19:19: Running with execution home </home/user1>, Execution CWD 
                     </home/user1/envs>, Execution Pid <28804>;

bsub

Submits job to LSF AE and forwards to execution clusters.

-m

Enhanced option allows you to specify local hosts, remote hosts, and execution clusters.

bsub -m ["[host_name|hostgroup][@cluster_name][[!]|+pref_level]]|cluster_name[+[pref_level]]..."

Cluster preference set at the job level (bsub -m) overrides cluster preference set at the queue level (SNDJOBS_TO).
Local hosts are always used before remote hosts.
Host names without a cluster specified must be unique.
When both clusters and hosts are specified, the host list is merged and filtered after the job is forwarded to an execution cluster.
The keyword others only applies to local hosts. To specify other remote hosts in a cluster, use the cluster name.

For example, bsub -m "hostA@cluster1 cluster1" refers to hostA and others in cluster1 without using the keyword others for remote hosts.

Examples:

bsub -m cluster2

The submission cluster forwards the job to cluster2.

bsub -m hostB@cluster2

The submission cluster forwards the job to cluster2 to run on hostB.

bsub -m "cluster3+1 hostB@cluster2+2"

The job is forwarded to cluster2 to run on hostB if possible. If not, the job is forwarded to cluster3 to run on any host in that cluster.

bsub -m "local_host rmt_host@cluster1"

The job runs on local_host, if possible. If not, the job is forwarded to cluster1 to run on rmt_host.

bsub -m "rmt_host"

The job is forwarded to the cluster containing rmt_host; rmt_host must be a unique host name.

bsub -m "cluster2 hostE@cluster2"

The job is forwarded to cluster2. Since the entire cluster is specified with the same preference as hostE, no host preference applies within cluster2.

bsub -m "cluster3 cluster4 hostD@cluster3+1"

The job is forwarded to cluster3 if possible. Since both cluster and hosts within the cluster are specified, host preference is filtered (cluster3 hosts only) and merged to become bsub -m "others hostD+1" on cluster3.

If the job cannot be forwarded to cluster3, the submission cluster forwards the job to cluster4.

bswitch

Switches job order. Forwarded jobs can only be switched within one LSF AE execution cluster. Changes across execution clusters are not supported.

Run bswitch commands through the submission cluster, not through individual execution clusters.

btop

Changes job order. Forwarded job can only be moved within one LSF AE execution cluster. Changes across execution clusters are not supported.

Run btop commands through the submission cluster, not through individual execution clusters.

bstop/bresume/bkill

Controls job status throughout LSF AE.

Run these commands through the submission cluster. These commands are not supported on individual execution clusters.

busers

Displays all running and pending jobs counters for users in the local cluster.

Run busers queries through the submission cluster to see information for the complete LSF AE installation.

LSF AE requires the same user and user group definitions across all clusters.