The single cluster view maintained by LSF AE gives end users a single-cluster experience using existing Platform LSF commands. Additional command options allow administrators to track job progress through to the execution clusters.
For the output examples shown, consider a submission cluster with two execution clusters:
Submission cluster: sub_cluster
Master host: hostA
Slave hosts: no.
First execution cluster: cluster2
Master host: hostB
Slave hosts: no
Second execution cluster: cluster3
Master host: hostC
Slave hosts: hostD
Displays information and current status for the submission cluster and all configured execution clusters.
This command runs locally (in the execution cluster) when submitted by a running job.
% lsclusters
CLUSTER_NAME STATUS MASTER_HOST ADMIN HOSTS SERVERS
sub_cluster ok hostA lsfadmin 1 1
cluster2 ok hostB lsfadmin 1 1
cluster3 ok hostC lsfadmin 2 2
Shows host information for all hosts within all clusters connected by LSF AE.
This command runs locally (in the execution cluster) when submitted by a running job.
% lshosts
HOST_NAME type model cpuf ncpus maxmem maxswp server RESOURCES
hostA X86_64 PC6000 116.1 2 1000M 1983M Yes (mg)
hostB X86_64 Opteron2 47.0 1 1000M 1961M Yes (mg)
hostC X86_64 Intel_EM 60.0 2 7978M 1992M Yes (mg)
hostD X86_64 Intel_EM 60.0 2 3828M 2055M Yes (mg)
New option -cname includes the cluster name for execution cluster hosts and host groups in output for lshosts.
% lshosts -cname
HOST_NAME type model cpuf ncpus maxmem maxswp server RESOURCES
hostA X86_64 PC6000 116.1 2 1000M 1983M Yes (mg)
hostB@cluster2 X86_64 Opteron2 47.0 1 1000M 1961M Yes (mg)
hostC@cluster3 X86_64 Intel_EM 60.0 2 7978M 1992M Yes (mg)
hostD@cluster3 X86_64 Intel_EM 60.0 2 3828M 2055M Yes (mg)
Shows host load information for all hosts within all clusters connected by LSF AE.
This command runs locally (in the execution cluster) when submitted by a running job.
% lsload
HOST_NAME status r15s r1m r15m ut pg ls it tmp swp mem
hostA ok 0.2 0.0 0.0 1% 0.0 6 4 22G 1948M 725M
hostB ok 0.6 0.0 0.0 0% 0.0 2 0 18M 1554M 518M
hostC ok 0.4 0.1 0.0 7% 0.0 3 0 3270M 1990M 5448M
hostD ok 1.0 1.0 1.1 13% 0.0 8 20 21G 353M 491M
New option -cname includes the cluster name for execution cluster hosts and host groups in output for lsload.
% lsload -cname
HOST_NAME status r15s r1m r15m ut pg ls it tmp swp mem
hostA ok 0.2 0.0 0.0 1% 0.0 6 4 22G 1948M 725M
hostB@cluster2 ok 0.6 0.0 0.0 0% 0.0 2 0 18M 1554M 518M
hostC@cluster3 ok 0.4 0.1 0.0 7% 0.0 3 0 3270M 1990M 5448M
hostD@cluster3 ok 1.0 1.0 1.1 13% 0.0 8 20 21G 353M 491M
Displays accounting statistics for finished jobs within all clusters connected by LSF AE.
This command runs locally (in the execution cluster) when submitted by a running job.
Forwards requests made through the submission cluster to the appropriate remote host or cluster. Not supported within the local execution cluster.
Starts mbatchd in parallel, leaving the existing mbatchd free to respond to queries and run commands while the new mbatchd restarts, reading configuration files and replaying events. Once all events have been read mbatchd daemons merge, replaying new events and leaving only the new mbatchd running.
During a parallel mbatchd restart, new badmin mbdrestart and badmin reconfig commands are not accepted. Parallel mbatchd restart using badmin mbdrestart -p does not work with duplicate event logging (LSB_LOCALDIR in lsf.conf).
The existing command badmin mbdrestart remains unchanged.
In the submission cluster, displays a summary of the current LSF runtime information about the submission cluster and all execution clusters. In the execution cluster, displays a summary of the current LSF runtime information about the local execution cluster only.
The current LSF runtime information displayed includes information about hosts, jobs, users, user groups, and mbatchd startup and reconfiguration.
Sends the job back to the submission cluster for rescheduling, at the bottom of the queue.
Run bbot commands through the submission cluster, not through individual execution clusters.
Displays execution cluster resource provider and consumer information, resource flow information, and connection status between the submission cluster and execution cluster.
Use -app to view available application profiles in remote clusters.
Information related to LSF AE is displayed under the heading Job Forwarding Information.
LOCAL_QUEUE: Name of a LSF AE queue.
JOB_FLOW: Indicates direction of job flow.
send
The local queue is a submission cluster send-jobs queue (SNDJOBS_TO is defined in the local queue).
recv
The local queue is an execution cluster receive-jobs queue (RCVJOBS_FROM is defined in the local queue).
REMOTE: Shows the name of the remote queue, always the same as the local queue name.
For receive-jobs queues, always “-”.
CLUSTER: For send-jobs queues, shows the name of the execution cluster containing the receive-jobs queue.
For receive-jobs queues, shows the name of the submission cluster that can send jobs to the local queue.
STATUS: Indicates the connection status between the local queue and remote queue.
ok
The submission cluster and execution cluster can exchange information and the system is properly configured.
disc
Communication between the two clusters has not been established. This could occur because there are no jobs waiting to be dispatched, or because the remote master cannot be located.
reject
The remote queue rejects jobs from the send-jobs queue. The local queue and remote queue are connected and the clusters communicate, but the queue-level configuration is not correct.
% bclusters
[Job Forwarding Information ]
LOCAL_QUEUE JOB_FLOW REMOTE CLUSTER STATUS
queue1 send queue1 cluster2 ok
queue1 send queue1 cluster3 ok
[Resource Lease Information ]
No resources have been exported or borrowed
Submits live reconfiguration requests, updating configuration settings in active memory without restarting daemons.
The limit object_type has enhanced support for the following LSF AE key-value_pair keywords in lsb.resources for generic job forwarding limits:
CLUSTERS for the following action types: addmember, rmmember, update, create, delete
PER_CLUSTER for the following action types: addmember, rmmember, update, create, delete
FWD_SLOTS for the following action types: update, create, delete
Example:
bconf update limit=cluster_e1 "FWD_SLOTS=10"
Displays historical information for all jobs within all clusters connected by LSF AE.
This command runs locally (in the execution cluster) when submitted by a running job.
Displays information for all hosts within all clusters connected by LSF AE. This includes the submission cluster and execution clusters. The same format is used for all hosts, both in the submission cluster and in the execution clusters. The output displayed is sorted by host or host group name.
Execution cluster hosts and host groups can be identified by name only (host_name), or by name and cluster. (host_name@cluster_name). If there are multiple hosts or host groups with the same name, all are displayed.
% bhosts
HOST_NAME STATUS JL/U MAX NJOBS RUN SSUSP USUSP RSV
hostA ok - 2 0 0 0 0 0
hostB ok - 1 0 0 0 0 0
hostC ok - 2 0 0 0 0 0
hostD ok - 2 0 0 0 0 0
% bmgroup -l
GROUP_NAME CONDENSE HOSTS
hgroup1 Yes hostB
hgroup2 No hostC hostD
Includes the cluster name for execution cluster hosts and host groups in output for bhosts and bmgroup. The output displayed is sorted by cluster and then by host or host group name.
% bhosts -cname
HOST_NAME STATUS JL/U MAX NJOBS RUN SSUSP USUSP RSV
hostA ok - 2 0 0 0 0 0
hostB@cluster2 ok - 1 0 0 0 0 0
hostC@cluster3 ok - 2 0 0 0 0 0
hostD@cluster3 ok - 2 0 0 0 0 0
% bmgroup -l -cname
GROUP_NAME CONDENSE HOSTS
hgroup1 Yes hostB
hgroup2@Cluster3 No hostC hostD
Displays job status, adds job forward information and supports the -fwd filter option.
% bjobs
JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME
1 lsfuser RUN queue1 hostA hostC sleep 1234 Nov 29 1 14:08
Displays forwarding time and cluster name for forwarded pending and running jobs.
% bjobs -l
Job <1>, User <lsfuser>, Project <default>, Status <RUN>, Queue <queue1>,
Command <sleep 1234>
Mon Nov 29 14:08:35: Submitted from host <hostA>, CWD </home/lsfuser >,
Re-runnable;
Mon Nov 29 14:08:38: Job <1> forwarded to cluster <cluster3> as Job <1>;
Mon Nov 29 14:08:44: Started on <hostC>, Execution Home </home/lsfuser>,
Execution CWD </home/lsfuser>;
Mon Nov 29 14:08:46: Resource usage collected.
MEM: 2 Mbytes; SWAP: 32 Mbytes; NTHREAD: 1
PGID: 6395; PIDs: 6395
SCHEDULING PARAMETERS:
r15s r1m r15m ut pg io ls it tmp swp mem
loadSched - - - - - - - - - - -
loadStop - - - - - - - - - - -
Includes the cluster name for execution cluster hosts in output for bjobs -l.
% bjobs -l -cname
Job <1>, User <lsfuser>, Project <default>, Status <RUN>, Queue <queue1>,
Command <sleep 1234>
Mon Nov 29 14:08:35: Submitted from host <hostA>, CWD </home/lsfuser >,
Re-runnable;
Mon Nov 29 14:08:38: Job <1> forwarded to cluster <cluster3> as Job <1>;
Mon Nov 29 14:08:44: Started on <hostC@cluster3>, Execution Home
</home/lsfuser>, Execution CWD </home/lsfuser>;
Mon Nov 29 14:08:46: Resource usage collected.
MEM: 2 Mbytes; SWAP: 32 Mbytes; NTHREAD: 1
PGID: 6395; PIDs: 6395
SCHEDULING PARAMETERS:
r15s r1m r15m ut pg io ls it tmp swp mem
loadSched - - - - - - - - - - -
loadStop - - - - - - - - - - -
Displays summary information about unfinished jobs. bjobs -sum displays the count of job slots in the following states: running (RUN), system suspended (SSUSP), user suspended (USUSP), pending (PEND), forwarded to remote clusters and pending (FWD_PEND), and UNKNOWN.
bjobs -sum displays the job slot count only for the user’s own jobs.
% bjobs -sum
RUN SSUSP USUSP UNKNOWN PEND FWD_PEND
123 456 789 5 5 3
Use -sum with other options (like -m, -P, -q, and -u) to filter the results. For example, bjobs -sum -u user1 displays job slot counts just for user user1.
% bjobs -sum -u user1
RUN SSUSP USUSP UNKNOWN PEND FWD_PEND
20 10 10 0 5 0
Displays cluster limit data across all clusters, including forward limits and aggregated execution cluster limits. All clusters are shown by default.
In order to specify a remote limit with the -n option, include the cluster name as well as the limit name: blimits -n limit_name@cluster_name.
% blimits
Sub-master: blimits
FORWARD LIMITS:
NAME USERS QUEUES PROJECTS CLUSTERS FWD_SLOTS
NONAME000 - - proj1 ec1 ec2 1/5
NONAME001 - queue1 - ec1 ec2 2/7
NONAME002 - - - - 2/10
REMOTE CLUSTER <ec1>:
INTERNAL RESOURCE LIMITS:
NAME USERS QUEUES HOSTS PROJECTS SLOTS MEM TMP SWP JOBS
NONAME000 - queue1 all - 2/10 - - - -
REMOTE CLUSTER <ec2>:
No resource usage found.
Displays forward slot allocation limits.
Use -fwd with -c to display forward slot limit configuration.
Displays forward slot allocation limits for one or more specific clusters. -C cannot be used without -fwd.
Use -fwd -C with -c to display forward slot limit configuration for the specified cluster.
Modifies job submission options. Forwarded jobs can only be modified within one execution cluster. Changes across execution clusters are not supported.
Run bmod commands through the submission cluster, not through individual execution clusters.
Displays new LSF AE parameters.
Displays status of queues in the cluster, including all running and pending jobs counters, and fairshare information.
Enhanced output for LSF AE displays information about forwarded pending jobs for each queue in the SHARE_INFO_FOR section under the heading FWD_PEND.
% bqueues -lr queue1
QUEUE: queue1
-- No description provided.
PARAMETERS/STATISTICS
PRIO NICE STATUS MAX JL/U JL/P JL/H NJOBS PEND RUN SSUSP USUSP RSV
40 10 Open:Active - - - - 1 0 1 0 0 0
Interval for a host to accept two jobs is 0 seconds
SCHEDULING PARAMETERS:
r15s r1m r15m ut pg io ls it tmp swp mem
loadSched - - - - - - - - - - -
loadStop - - - - - - - - - - -
SCHEDULING POLICIES: FAIRSHARE ABS_CLUSTER_PREFERENCE
USER_SHARES: [default, 1]
SHARE_INFO_FOR: queue1/
USER/GROUP SHARES PRIORITY STARTED RESERVED CPU_TIME RUN_TIME FWD_PEND
lsfuser 1 0.167 1 0 0.0 0 0
USERS: all
HOSTS: none
REQUEUE_EXIT_VALUES: 1
SEND_JOBS_TO: queue1@cluster2, queue1@cluster3
RERUNNABLE : yes
MAX_RSCHED_TIME: 360 10
Requeues a job.
By default, when a job running on an execution cluster is requeued it returns to the submission cluster in the PEND state awaiting rescheduling.
Automatic job requeue using REQUEUE_EXIT_VALUES (in lsb.queues) returns a running job to the PEND state in the same execution cluster for local rescheduling.
% bhist -l 887
Job <887>, User <lsfadmin>, Project <default>, Command <sleep 10000>
Wed Dec 8 19:37:21: Submitted from host <hostD>, to Queue <test1>, CWD <$HOME>,
Requested Resources <type==any>;
Wed Dec 8 19:37:25: Forwarded job to cluster cluster2;
Wed Dec 8 19:37:28: Dispatched to <hostB@cluster3>;
Wed Dec 8 19:37:28: Starting (Pid 22557);
Wed Dec 8 19:37:28: Running with execution home </home/lsfadmin>, Execution CWD
</home/lsfadmin>, Execution Pid <22557>;
Wed Dec 8 19:37:29: Signal <REQUEUE_PEND> requested by user or administrator
<lsfadmin>;
Wed Dec 8 19:37:30: Exited with exit code 130. The CPU time used is 0.1 seconds;
Wed Dec 8 19:38:31: Pending: Job has been requeued;
Wed Dec 8 19:38:31: Forwarded job to cluster cluster3;
Wed Dec 8 19:38:33: Dispatched to <hostD@cluster3>;
Wed Dec 8 19:38:33: Starting (Pid 24115);
Wed Dec 8 19:38:33: Running with execution home </home/lsfadmin>, Execution CWD
</home/lsfadmin>, Execution Pid <24115>;
Summary of time in seconds spent in various states by Wed Dec 8 19:39:24
PEND PSUSP RUN USUSP SSUSP UNKWN TOTAL
70 0 53 0 0 0 123
Automatic requeue example with REQUEUE_EXIT_VALUES defined:
% bhist -l 889
Job <889>, User <lsfadmin>, Project <default>, Command <sleep 5; exit 13>
Wed Dec 8 20:03:07: Submitted from host <hostD>, to Queue <test1>, CWD <$HOME>,
Requested Resources <type==any>;
Wed Dec 8 20:03:13: Forwarded job to cluster ecluster2;
Wed Dec 8 20:03:19: Dispatched to <host3@ecluster2>;
Wed Dec 8 20:03:19: Starting (Pid 29850);
Wed Dec 8 20:03:19: Running with execution home </home/lsfadmin>, Execution CWD
</home/lsfadmin>, Execution Pid <29850>;
Wed Dec 8 20:03:24: Pending: Job is requeued on the execution cluster due to exit
value;
Wed Dec 8 20:03:29: Dispatched to <host3@ecluster2>;
Wed Dec 8 20:03:31: Starting (Pid 29866);
Wed Dec 8 20:03:31: Running with execution home </home/lsfadmin>, Execution CWD
</home/lsfadmin>, Execution Pid <29866>;
Wed Dec 8 20:03:34: Pending: Job is requeued on the execution cluster due to exit
value;
Displays host topology information for hosts within all clusters connected by LSF AE.
This command runs locally (in the execution cluster) when submitted by a running job.
Forces a pending or finished job to run or be forwarded to a specified cluster. The exact behavior of brun on a pending job depends on where the job is pending, and which hosts or clusters are specified in the brun command.
Only administrators can use the brun command. You can only run brun from the submission cluster.
You must specify one or more host names or a cluster name when you force a job to run.
If multiple hosts are specified, the first available host is selected and the remainder ignored. Specified hosts cannot belong to more than one cluster.
You can only specify one cluster name. The job is forced to be forwarded to the specified cluster.
You cannot specify host names and cluster names together in the same brun command.
A job pending in an execution cluster forced to run in a different cluster is returned to the submission cluster, and then forwarded once again.
If a job is submitted with a cluster name and the job is forwarded to a remote cluster, you cannot use brun -m again to switch the job to another execution cluster. For example:
bsub -m cluster1 -q test1 sleep 1000
The job is pending on cluster1. Running brun again to forward the job to cluster2 is rejected:
brun -m cluster2 1803
Failed to run the job: Hosts requested do not belong to the cluster
For example:
brun -m "host12 host27"
In this example, if host12 is available the job is sent to the cluster containing host12 and tries to run. If unsuccessful, the job pends in the cluster containing host12. If host12 is not available, the job is sent to the cluster containing host27 where it runs or pends.
Job runs locally. For example:
brun -m hostA 246
Job <246> is being forced to run or forwarded.
bjobs 246
JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME
246 user1 RUN normal hostD hostA *eep 10000 Jan 3 12:15
bhist -l 246
Job <246>, User <user1>, Project <default>, Command <sleep 10000>
Mon Jan 3 12:15:22: Submitted from host <hostD>, to Queue <normal>, CWD
<$HOME/envs>, Requested Resources <type == any>;
Mon Jan 3 12:16:13: Job is forced to run or forwarded by user or administrator
<user1>;
Mon Jan 3 12:16:13: Dispatched to <hostA>;
Mon Jan 3 12:16:41: Starting (Pid 10467);
Mon Jan 3 12:16:59: Running with execution home </home/user1>, Execution CWD
</home/user1/envs>, Execution Pid <10467>;
Job is forwarded to execution cluster containing specified host, and runs.
For example:
brun -m hostB 244
Job <244> is being forced to run or forwarded.
bjobs 244
JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME
244 user1 RUN normal hostD hostB *eep 10000 Jan 3 12:15
bhist -l 244
Job <244>, User <user1>, Project <default>, Command <sleep 10000>
Mon Jan 3 12:15:22: Submitted from host <hostD>, to Queue <normal>, CWD
<$HOME/envs>, Requested Resources <type == any>;
Mon Jan 3 12:19:18: Job is forced to run or forwarded by user or administrator
<user1>;
Mon Jan 3 12:19:18: Forwarded job to cluster cluster2;
Mon Jan 3 12:19:18: Remote job control initiated;
Mon Jan 3 12:19:18: Dispatched to <hostB>;
Mon Jan 3 12:19:18: Remote job control completed;
Mon Jan 3 12:19:19: Starting (Pid 28804);
Mon Jan 3 12:19:19: Running with execution home </home/user1>, Execution CWD
</home/user1/envs>, Execution Pid <28804>;
Job runs on the specified host in the same execution cluster. For example:
brun -m hostB 237
Job <237> is being forced to run or forwarded.
bjobs 237
JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME
237 user1 RUN normal hostD hostB *eep 10000 Jan 3 12:14
bhist -l 237
Job <237>, User <user1>, Project <default>, Command <sleep 10000>
Mon Jan 3 12:14:48: Submitted from host <hostD>, to Queue <normal>, CWD
<$HOME/envs>, Requested Resources <type == any>;
Mon Jan 3 12:14:53: Forwarded job to cluster cluster2;
Mon Jan 3 12:22:08: Job is forced to run or forwarded by user or administrator
<user1>;
Mon Jan 3 12:22:08: Remote job control initiated;
Mon Jan 3 12:22:08: Dispatched to <hostB>;
Mon Jan 3 12:22:09: Remote job control completed;
Mon Jan 3 12:22:09: Starting (Pid 0);
Mon Jan 3 12:22:09: Starting (Pid 29073);
Mon Jan 3 12:22:09: Running with execution home </home/user1>, Execution CWD
</home/user1/envs>, Execution Pid <29073>;
Job runs on the specified host in the submission cluster. For example:
brun -m hostA 238
Job <238> is being forced to run or forwarded.
bjobs 237
JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME
238 user1 RUN normal hostB hostA *eep 10000 Oct 5 11:00
bhist -l 237
Job <237>, User <user1>, Project <default>, Command <sleep 10000>
Wed Oct 5 11:00:16: Submitted from host <hostB>, to Queue <normal>, CWD
</usr/local/xl/conf>, Requested Resources <type == any>;
Wed Oct 5 11:00:18: Forwarded job to cluster ec1;
Wed Oct 5 11:00:46: Job is forced to run or forwarded by user or administrator
<user1>;
Wed Oct 5 11:00:46: Pending: Job has returned from remote cluster;
Wed Oct 5 11:00:46: Dispatched to <hostA>;
Wed Oct 5 11:00:46: Starting (Pid 15686);
Wed Oct 5 11:00:47: Running with execution home </home/user1>, Execution CWD
</usr/local/xl/conf>, Execution Pid <15686>;
Summary of time in seconds spent in various states by Wed Oct 5 11:01:06
PEND PSUSP RUN USUSP SSUSP UNKWN TOTAL
30 0 20 0 0 0 50
Job returns to submission cluster, is forwarded to execution cluster containing specified host, and runs.
brun -m ec2-hostA 3111
Job <3111> is being forced to run or forwarded.
bjobs 3111
JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME
3111 user1 RUN queue1 sub-master ec2-hostA sleep 1000 Feb 23 11:21
bhist -l 3111
Job <3111>, User <user1>, Project <default>, Command <sleep 1000>
Wed Feb 23 11:21:00: Submitted from host <sub-master>, to Queue <queue1>, CWD
</usr/local/xl/conf>;
Wed Feb 23 11:21:03: Forwarded job to cluster cluster1;
Wed Feb 23 11:21:58: Job is forced to run or forwarded by user or administrator
<user1>;
Wed Feb 23 11:21:58: Pending: Job has returned from remote cluster;
Wed Feb 23 11:21:58: Forwarded job to cluster cluster2;
Wed Feb 23 11:21:58: Remote job run control initiated;
Wed Feb 23 11:21:59: Dispatched to <ec2-hostA>;
Wed Feb 23 11:21:59: Remote job run control completed;
Wed Feb 23 11:21:59: Starting (Pid 3257);
Wed Feb 23 11:21:59: Running with execution home </home/user1>, Execution CWD
</usr/local/xl/conf >, Execution Pid <3257>;
Summary of time in seconds spent in various states by Wed Feb 23 11:24:59
PEND PSUSP RUN USUSP SSUSP UNKWN TOTAL
59 0 180 0 0 0 239
Job has already been forwarded to an execution cluster, and you specify a different execution cluster. The job returns to submission cluster, and is forced to be forwarded to the specified execution cluster. The job is not forced to run in the new execution cluster. After the job is forwarded, the execution cluster schedules the job according to local policies.
For example:
brun -m cluster2 244
Job <244> is being forced to run or forwarded.
bjobs 244
JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME
244 user1 RUN normal hostD hostB *eep 10000 Jan 3 12:15
bhist -l 244
Job <244>, User <user1>, Project <default>, Command <sleep 10000>
Mon Jan 3 12:15:22: Submitted from host <hostD>, to Queue <normal>, CWD
<$HOME/envs>, Requested Resources <type == any>;
Mon Jan 3 12:15:25: Forwarded job to cluster cluster1;
Mon Jan 3 12:19:18: Job is forced to run or forwarded by user or administrator
<user1>;
Mon Jan 3 12:19:18: Pending: Job has returned from remote cluster;
Mon Jan 3 12:19:18: Forwarded job to cluster cluster2;
Mon Jan 3 12:19:18: Dispatched to <hostB>;
Mon Jan 3 12:19:19: Starting (Pid 28804);
Mon Jan 3 12:19:19: Running with execution home </home/user1>, Execution CWD
</home/user1/envs>, Execution Pid <28804>;
Job is forwarded to the specified execution cluster, but the job is not forced to run. After the job is forwarded, the execution cluster schedules the job according to local policies.
For example:
brun -m cluster2 244
Job <244> is being forced to run or forwarded.
bhist -l 244
Job <244>, User <user1>, Project <default>, Command <sleep 10000>
Mon Jan 3 12:15:22: Submitted from host <hostD>, to Queue <normal>, CWD
<$HOME/envs>, Requested Resources <type == any>;
Mon Jan 3 12:19:18: Job is forced to run or forwarded by user or administrator
<user1>;
Mon Jan 3 12:19:18: Forwarded job to cluster cluster2;
Mon Jan 3 12:19:18: Remote job control initiated;
Mon Jan 3 12:19:18: Dispatched to <hostB>;
Mon Jan 3 12:19:18: Remote job control completed;
Mon Jan 3 12:19:19: Starting (Pid 28804);
Mon Jan 3 12:19:19: Running with execution home </home/user1>, Execution CWD
</home/user1/envs>, Execution Pid <28804>;
Submits job to LSF AE and forwards to execution clusters.
Enhanced option allows you to specify local hosts, remote hosts, and execution clusters.
bsub -m ["[host_name|hostgroup][@cluster_name][[!]|+pref_level]]|cluster_name[+[pref_level]]..."
Cluster preference set at the job level (bsub -m) overrides cluster preference set at the queue level (SNDJOBS_TO).
Local hosts are always used before remote hosts.
Host names without a cluster specified must be unique.
When both clusters and hosts are specified, the host list is merged and filtered after the job is forwarded to an execution cluster.
The keyword others only applies to local hosts. To specify other remote hosts in a cluster, use the cluster name.
For example, bsub -m "hostA@cluster1 cluster1" refers to hostA and others in cluster1 without using the keyword others for remote hosts.
Examples:
bsub -m cluster2
The submission cluster forwards the job to cluster2.
bsub -m hostB@cluster2
The submission cluster forwards the job to cluster2 to run on hostB.
bsub -m "cluster3+1 hostB@cluster2+2"
The job is forwarded to cluster2 to run on hostB if possible. If not, the job is forwarded to cluster3 to run on any host in that cluster.
bsub -m "local_host rmt_host@cluster1"
The job runs on local_host, if possible. If not, the job is forwarded to cluster1 to run on rmt_host.
bsub -m "rmt_host"
The job is forwarded to the cluster containing rmt_host; rmt_host must be a unique host name.
bsub -m "cluster2 hostE@cluster2"
The job is forwarded to cluster2. Since the entire cluster is specified with the same preference as hostE, no host preference applies within cluster2.
bsub -m "cluster3 cluster4 hostD@cluster3+1"
The job is forwarded to cluster3 if possible. Since both cluster and hosts within the cluster are specified, host preference is filtered (cluster3 hosts only) and merged to become bsub -m "others hostD+1" on cluster3.
If the job cannot be forwarded to cluster3, the submission cluster forwards the job to cluster4.
Switches job order. Forwarded jobs can only be switched within one LSF AE execution cluster. Changes across execution clusters are not supported.
Run bswitch commands through the submission cluster, not through individual execution clusters.
Changes job order. Forwarded job can only be moved within one LSF AE execution cluster. Changes across execution clusters are not supported.
Run btop commands through the submission cluster, not through individual execution clusters.
Controls job status throughout LSF AE.
Run these commands through the submission cluster. These commands are not supported on individual execution clusters.
Displays all running and pending jobs counters for users in the local cluster.
Run busers queries through the submission cluster to see information for the complete LSF AE installation.
LSF AE requires the same user and user group definitions across all clusters.