Project

General

Profile

Actions

Idea #17443

closed

Scoping/grooming LSF work

Added by Peter Amstutz about 3 years ago. Updated almost 3 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
-
Target version:
Story points:
-

Files

lsfsce10.2_quick_start.pdf (401 KB) lsfsce10.2_quick_start.pdf Nico César, 03/03/2021 08:08 PM
lsf9.1_quick_reference.pdf (700 KB) lsf9.1_quick_reference.pdf Nico César, 03/04/2021 07:38 PM

Related issues

Related to Arvados Epics - Idea #16304: LSF supportResolved04/01/202109/30/2021Actions
Actions #1

Updated by Nico César about 3 years ago

I started with LSF Community edition. To get the tarball:

Go to https://www-01.ibm.com/marketing/iwm/iwm/web/preLogin.do?source=swerpzsw-lsf-3 and create an account Accept the license and you should be able download a file named lsfsce10.2.0.6-x86_64.tar.gz

Also a "getting started" guide, that I attached

https://github.com/IBMSpectrumComputing/lsf-python-api is the support for the python library.

https://github.com/IBMSpectrumComputing/lsf-python-api/blob/master/pythonlsf/lsf.i maybe can be included into go by doing
http://www.swig.org/Doc3.0/Go.html

Actions #3

Updated by Peter Amstutz about 3 years ago

Actions #4

Updated by Peter Amstutz about 3 years ago

They probably have a C API that's easier to consume from Go? Maybe?

For slurm, we just invoke the command line tools and parse the output. I'm fine with doing that for LSF as well, unless using the API seems like a better bet.

Actions #5

Updated by Nico César about 3 years ago

after some good fight with my virtualbox/vagrant environment I tried to spin up https://github.com/MorganRodgers/lsf-workbench

so far I was unable to. more news to come.

Actions #6

Updated by Nico César about 3 years ago

from: https://support.sas.com/rnd/scalability/platform/PSS8.1/lsf9.1_quick_reference.pdf

bsub I/O related flags:

-e  error_file Appends the standard error output to a file
-eo error_file Overwrites the standard error output of the job to the specified file
-i input_file|-is input_file Gets the the standard input for the job from specified file
-o output_file Appends the standard output to a file
-oo output_file Overwrites the standard output of the job tothe specified file
Actions #7

Updated by Nico César about 3 years ago

from: https://support.sas.com/rnd/scalability/platform/PSS11.1/lsf10.1_config_ref.pdf

There are some configuration nobs to create containers based in apps. I'll copy the important bits here. Also there is a big reference of env variables we can specify.

lsb.applications

The lsb.applications file defines application profiles. Use application profiles to
define common parameters for the same type of jobs, including the execution
requirements of the applications, the resources they require, and how they should
be run and managed.

This file is optional. Use the DEFAULT_APPLICATION parameter in the lsb.params file
to specify a default application profile for all jobs. LSF does not automatically
assign a default application profile.
This file is installed by default in the LSB_CONFDIR/cluster_name/configdir
directory.

Changing lsb.applications configuration
After you change the lsb.applications file, run the badmin reconfig command to
reconfigure the mbatchd daemon. Configuration changes apply to pending jobs
only. Running jobs are not affected.
lsb.applications structure

Each application profile definition begins with the line Begin Application and
ends with the line End Application. The application name must be specified. All
other parameters are optional.

Example

Begin Application
NAME = catia
DESCRIPTION = CATIA V5
CPULIMIT = 24:0/hostA # 24 hours of host hostA
FILELIMIT = 20000
DATALIMIT = 20000
# jobs data segment limit
CORELIMIT = 20000
TASKLIMIT = 5
# job processor limit
REQUEUE_EXIT_VALUES = 55 34 78
End Application

CONTAINER

Syntax

CONTAINER=docker[image(image_name) options(docker_run_options)]
CONTAINER=nvidia-docker[image(image_name) options(docker_run_options)]
CONTAINER=shifter[image(image_name) options(container_options)]
CONTAINER=singularity[image(image_name) options(container_options)]

Description

Enables LSF to use a Docker, NVIDIA Docker, , Shifter, or Singularity container for
jobs that are submitted to this application profile.

Examples


To specify an Ubuntu image for use with container jobs without specifying any
optional keywords,
Begin Application
NAME = dockerapp
CONTAINER = docker[image(repository.example.com:5000/file/path/ubuntu:latest)]
DESCRIPTION = Docker User Service
End Application
Begin Application
NAME = shifterapp
CONTAINER = shifter[image(ubuntu:latest)]
DESCRIPTION = Shifter User Service
End Application
Begin Application
NAME = singapp
CONTAINER = singularity[image(/file/path/ubuntu.img)]
DESCRIPTION = Singularity User Service
End Application
To specify a pre-execution script in the /share/usr/ directory, which generates the
container startup options,
Begin Application
NAME = dockerappoptions
CONTAINER = docker[image(repository.example.com:5000/file/path/ubuntu:latest) options(@/share/usr/doc
DESCRIPTION = Docker User Service with pre-execution script for options
End Application
Begin Application
NAME = shifterappoptions
CONTAINER = shifter[image(ubuntu:latest) options(@/share/usr/shifter-options.sh)]
DESCRIPTION = Shifter User Service
End Application
Begin Application
NAME = singappoptions
CONTAINER = singularity[image(/file/path/ubuntu.img) options(@/share/usr/sing-options.sh)]
DESCRIPTION = Singularity User Service
End Application

EXEC_DRIVER

Syntax
EXEC_DRIVER=context[user(user_name)] starter[/file_path_serverdir/docker-
starter.py] controller[/file_path/to/serverdir/docker-control.py]
monitor[/file_path/to/serverdir/docker-monitor.py]
Replace file_path/to/serverdir with the actual file path of the LSF_SERVERDIR
directory.

Description

Specifies the execution driver framework for Docker container jobs in this
application profile. This parameter uses the following keyword:
user

Optional. This keyword specifies the user account for starting scripts. The
configured value is a user name instead of a user ID. For Docker jobs, this
user must be a member of the Docker user group. By default this is the
LSF primary administrator.
Note: This cannot be the root user.

LSF includes three execution driver scripts that are used to start a job
(docker-starter.py), monitor the resource of a job (docker-monitor.py), and send
a signal to a job (docker-control.py). These scripts are located in the
LSF_SERVERDIR directory. Change the owner of the script files to the context user
and change the file permissions to 700 or 500 before using them in the EXEC_DRIVER
parameter.

Interaction with the CONTAINER parameter for Docker jobs
For Docker jobs, the EXEC_DRIVER parameter interacts with the following keywords
in the CONTAINER parameter:
  • image, which specifies the image name ($LSB_CONTAINER_IMAGE environment
    variable) is supported when specifying the script names.
  • options with runtime options and the option script is supported.

Example

Begin Application
NAME = dockerapp
CONTAINER = docker[image(repository.example.com:5000/file/path/ubuntu:latest)
options(--rm --network=host --ipc=host -v /path/to/my/passwd:/etc/passwd)]
EXEC_DRIVER = context[user(user-name)] starter[/path/to/driver/docker-starter.py]
controller[/path/to/driver/docker-control.py]
monitor[/path/to/driver/docker-monitor.py]
DESCRIPTION = Docker User Service
End Application

Actions #8

Updated by Nico César about 3 years ago

TL;DR: "set the LSB_CONTAINER_IMAGE environment variable at job submission time to specify the Docker image name."

This is the configuration:
https://www.ibm.com/support/knowledgecenter/SSWRJV_10.1.0/lsf_docker/lsf_docker_config.html

This is the usage
https://www.ibm.com/support/knowledgecenter/SSWRJV_10.1.0/lsf_docker/lsf_docker_use.html

https://www.ibm.com/support/knowledgecenter/SSWRJV_10.1.0/lsf_container/lsf_singularity_config.html this is for singularity coinfiguration

Actions #9

Updated by Peter Amstutz about 3 years ago

Is there anything suggesting that we actually want to use the container support in LSF, or do you think we may be forced to use the container support in LSF?

Actions #10

Updated by Nico César about 3 years ago

from:
https://www.ibm.com/support/pages/containers-and-lsf-lsbcontainerimage-environmental-variable

Yes. For latest LSF version 10.1.0.2, LSF $LSB_CONTAINER_IMAGE environmental variable only can be used in Docker/LSF integration. You can not use this environment variable in Shifter and Singularity/LSF integration feature.

Seems that for singularity you choose an app to run, or a queue (bsub -a/bsub -q) and that has Singularity Image defined via the CONTAINER option.

Actions #11

Updated by Nico César about 3 years ago

Peter Amstutz wrote:

Is there anything suggesting that we actually want to use the container support in LSF, or do you think we may be forced to use the container support in LSF?

If I understand correcly an arvados container has an attribute container_image that we'll have to honor. If I understand correctly we will be forced to use a container support that is provided. I'm trying to figure out how restricted is LSF compared to SLURM.

Actions #12

Updated by Peter Amstutz about 3 years ago

Nico César wrote:

Peter Amstutz wrote:

Is there anything suggesting that we actually want to use the container support in LSF, or do you think we may be forced to use the container support in LSF?

If I understand correcly an arvados container has an attribute container_image that we'll have to honor. If I understand correctly we will be forced to use a container support that is provided. I'm trying to figure out how restricted is LSF compared to SLURM.

Well that's one of the things we need to find out. We really really really prefer crunch-run be run as a regular program without any containerization, because it is going to handle the containerization itself.

Actions #14

Updated by Peter Amstutz about 3 years ago

  • Target version changed from 2021-03-17 sprint to 2021-03-31 sprint
Actions #15

Updated by Nico César about 3 years ago

  • Target version changed from 2021-03-31 sprint to 2021-04-14 sprint
Actions #16

Updated by Javier Bértoli about 3 years ago

Based on the code Nico César mentions at #17443-5, I created a modified copy in our git server, "lsf-test-bed" (:lsf-test-bed.git)

  • added debian installation
  • added singularity to the `head` node
  • added examples to test qsub (a simple python script and a singularity run)
  • removed git-annex lsf's tgz
  • documented how to download lsf's tgz from
  • updated documentation to explain how to setup and test for Curii's needs
Actions #17

Updated by Peter Amstutz about 3 years ago

  • Target version changed from 2021-04-14 sprint to 2021-05-12 sprint
Actions #18

Updated by Peter Amstutz almost 3 years ago

  • Target version changed from 2021-05-12 sprint to 2021-05-26 sprint
Actions #19

Updated by Peter Amstutz almost 3 years ago

  • Target version changed from 2021-05-26 sprint to 2021-06-09 sprint
Actions #20

Updated by Peter Amstutz almost 3 years ago

  • Assigned To changed from Nico César to Peter Amstutz
Actions #21

Updated by Peter Amstutz almost 3 years ago

  • Status changed from New to In Progress
Actions #22

Updated by Peter Amstutz almost 3 years ago

  • Target version changed from 2021-06-09 sprint to 2021-06-23 sprint
Actions #23

Updated by Peter Amstutz almost 3 years ago

  • Status changed from In Progress to Resolved
Actions

Also available in: Atom PDF