Idea #17443
closedScoping/grooming LSF work
Added by Peter Amstutz over 3 years ago. Updated over 3 years ago.
Files
lsfsce10.2_quick_start.pdf (401 KB) lsfsce10.2_quick_start.pdf | Nico César, 03/03/2021 08:08 PM | ||
lsf9.1_quick_reference.pdf (700 KB) lsf9.1_quick_reference.pdf | Nico César, 03/04/2021 07:38 PM |
Related issues
Updated by Nico César over 3 years ago
I started with LSF Community edition. To get the tarball:
Go to https://www-01.ibm.com/marketing/iwm/iwm/web/preLogin.do?source=swerpzsw-lsf-3 and create an account Accept the license and you should be able download a file named lsfsce10.2.0.6-x86_64.tar.gz
Also a "getting started" guide, that I attached
https://github.com/IBMSpectrumComputing/lsf-python-api is the support for the python library.
https://github.com/IBMSpectrumComputing/lsf-python-api/blob/master/pythonlsf/lsf.i maybe can be included into go by doing
http://www.swig.org/Doc3.0/Go.html
Updated by Nico César over 3 years ago
Updated by Peter Amstutz over 3 years ago
- Related to Idea #16304: LSF support added
Updated by Peter Amstutz over 3 years ago
They probably have a C API that's easier to consume from Go? Maybe?
For slurm, we just invoke the command line tools and parse the output. I'm fine with doing that for LSF as well, unless using the API seems like a better bet.
Updated by Nico César over 3 years ago
after some good fight with my virtualbox/vagrant environment I tried to spin up https://github.com/MorganRodgers/lsf-workbench
so far I was unable to. more news to come.
Updated by Nico César over 3 years ago
from: https://support.sas.com/rnd/scalability/platform/PSS8.1/lsf9.1_quick_reference.pdf
bsub I/O related flags:
-e error_file Appends the standard error output to a file -eo error_file Overwrites the standard error output of the job to the specified file -i input_file|-is input_file Gets the the standard input for the job from specified file -o output_file Appends the standard output to a file -oo output_file Overwrites the standard output of the job tothe specified file
Updated by Nico César over 3 years ago
from: https://support.sas.com/rnd/scalability/platform/PSS11.1/lsf10.1_config_ref.pdf
There are some configuration nobs to create containers based in apps. I'll copy the important bits here. Also there is a big reference of env variables we can specify.
lsb.applications¶
The lsb.applications file defines application profiles. Use application profiles to
define common parameters for the same type of jobs, including the execution
requirements of the applications, the resources they require, and how they should
be run and managed.
This file is optional. Use the DEFAULT_APPLICATION parameter in the lsb.params file
to specify a default application profile for all jobs. LSF does not automatically
assign a default application profile.
This file is installed by default in the LSB_CONFDIR/cluster_name/configdir
directory.
Changing lsb.applications configuration
After you change the lsb.applications file, run the badmin reconfig command to
reconfigure the mbatchd daemon. Configuration changes apply to pending jobs
only. Running jobs are not affected.
lsb.applications structure
Each application profile definition begins with the line Begin Application and
ends with the line End Application. The application name must be specified. All
other parameters are optional.
Example¶
Begin Application NAME = catia DESCRIPTION = CATIA V5 CPULIMIT = 24:0/hostA # 24 hours of host hostA FILELIMIT = 20000 DATALIMIT = 20000 # jobs data segment limit CORELIMIT = 20000 TASKLIMIT = 5 # job processor limit REQUEUE_EXIT_VALUES = 55 34 78 End Application
CONTAINER¶
Syntax
CONTAINER=docker[image(image_name) options(docker_run_options)] CONTAINER=nvidia-docker[image(image_name) options(docker_run_options)] CONTAINER=shifter[image(image_name) options(container_options)] CONTAINER=singularity[image(image_name) options(container_options)]
Description¶
Enables LSF to use a Docker, NVIDIA Docker, , Shifter, or Singularity container for
jobs that are submitted to this application profile.
Examples¶
To specify an Ubuntu image for use with container jobs without specifying any optional keywords, Begin Application NAME = dockerapp CONTAINER = docker[image(repository.example.com:5000/file/path/ubuntu:latest)] DESCRIPTION = Docker User Service End Application Begin Application NAME = shifterapp CONTAINER = shifter[image(ubuntu:latest)] DESCRIPTION = Shifter User Service End Application Begin Application NAME = singapp CONTAINER = singularity[image(/file/path/ubuntu.img)] DESCRIPTION = Singularity User Service End Application To specify a pre-execution script in the /share/usr/ directory, which generates the container startup options, Begin Application NAME = dockerappoptions CONTAINER = docker[image(repository.example.com:5000/file/path/ubuntu:latest) options(@/share/usr/doc DESCRIPTION = Docker User Service with pre-execution script for options End Application Begin Application NAME = shifterappoptions CONTAINER = shifter[image(ubuntu:latest) options(@/share/usr/shifter-options.sh)] DESCRIPTION = Shifter User Service End Application Begin Application NAME = singappoptions CONTAINER = singularity[image(/file/path/ubuntu.img) options(@/share/usr/sing-options.sh)] DESCRIPTION = Singularity User Service End Application
EXEC_DRIVER¶
Syntax EXEC_DRIVER=context[user(user_name)] starter[/file_path_serverdir/docker- starter.py] controller[/file_path/to/serverdir/docker-control.py] monitor[/file_path/to/serverdir/docker-monitor.py] Replace file_path/to/serverdir with the actual file path of the LSF_SERVERDIR directory.
Description¶
Specifies the execution driver framework for Docker container jobs in this
application profile. This parameter uses the following keyword:
user
Optional. This keyword specifies the user account for starting scripts. The
configured value is a user name instead of a user ID. For Docker jobs, this
user must be a member of the Docker user group. By default this is the
LSF primary administrator.
Note: This cannot be the root user.
LSF includes three execution driver scripts that are used to start a job
(docker-starter.py), monitor the resource of a job (docker-monitor.py), and send
a signal to a job (docker-control.py). These scripts are located in the
LSF_SERVERDIR directory. Change the owner of the script files to the context user
and change the file permissions to 700 or 500 before using them in the EXEC_DRIVER
parameter.
For Docker jobs, the EXEC_DRIVER parameter interacts with the following keywords
in the CONTAINER parameter:
- image, which specifies the image name ($LSB_CONTAINER_IMAGE environment
variable) is supported when specifying the script names. - options with runtime options and the option script is supported.
Example¶
Begin Application NAME = dockerapp CONTAINER = docker[image(repository.example.com:5000/file/path/ubuntu:latest) options(--rm --network=host --ipc=host -v /path/to/my/passwd:/etc/passwd)] EXEC_DRIVER = context[user(user-name)] starter[/path/to/driver/docker-starter.py] controller[/path/to/driver/docker-control.py] monitor[/path/to/driver/docker-monitor.py] DESCRIPTION = Docker User Service End Application
Updated by Nico César over 3 years ago
TL;DR: "set the LSB_CONTAINER_IMAGE environment variable at job submission time to specify the Docker image name."
This is the configuration:
https://www.ibm.com/support/knowledgecenter/SSWRJV_10.1.0/lsf_docker/lsf_docker_config.html
This is the usage
https://www.ibm.com/support/knowledgecenter/SSWRJV_10.1.0/lsf_docker/lsf_docker_use.html
https://www.ibm.com/support/knowledgecenter/SSWRJV_10.1.0/lsf_container/lsf_singularity_config.html this is for singularity coinfiguration
Updated by Peter Amstutz over 3 years ago
Is there anything suggesting that we actually want to use the container support in LSF, or do you think we may be forced to use the container support in LSF?
Updated by Nico César over 3 years ago
from:
https://www.ibm.com/support/pages/containers-and-lsf-lsbcontainerimage-environmental-variable
Yes. For latest LSF version 10.1.0.2, LSF $LSB_CONTAINER_IMAGE environmental variable only can be used in Docker/LSF integration. You can not use this environment variable in Shifter and Singularity/LSF integration feature.
Seems that for singularity you choose an app to run, or a queue (bsub -a/bsub -q) and that has Singularity Image defined via the CONTAINER option.
Updated by Nico César over 3 years ago
Peter Amstutz wrote:
Is there anything suggesting that we actually want to use the container support in LSF, or do you think we may be forced to use the container support in LSF?
If I understand correcly an arvados container has an attribute container_image that we'll have to honor. If I understand correctly we will be forced to use a container support that is provided. I'm trying to figure out how restricted is LSF compared to SLURM.
Updated by Peter Amstutz over 3 years ago
Nico César wrote:
Peter Amstutz wrote:
Is there anything suggesting that we actually want to use the container support in LSF, or do you think we may be forced to use the container support in LSF?
If I understand correcly an arvados container has an attribute container_image that we'll have to honor. If I understand correctly we will be forced to use a container support that is provided. I'm trying to figure out how restricted is LSF compared to SLURM.
Well that's one of the things we need to find out. We really really really prefer crunch-run
be run as a regular program without any containerization, because it is going to handle the containerization itself.
Updated by Peter Amstutz over 3 years ago
- Target version changed from 2021-03-17 sprint to 2021-03-31 sprint
Updated by Nico César over 3 years ago
- Target version changed from 2021-03-31 sprint to 2021-04-14 sprint
Updated by Javier Bértoli over 3 years ago
Based on the code Nico César mentions at #17443-5, I created a modified copy in our git server, "lsf-test-bed" (git@git.curii.com:lsf-test-bed.git)
- added debian installation
- added singularity to the `head` node
- added examples to test qsub (a simple python script and a singularity run)
- removed git-annex lsf's tgz
- documented how to download lsf's tgz from
- updated documentation to explain how to setup and test for Curii's needs
Updated by Peter Amstutz over 3 years ago
- Target version changed from 2021-04-14 sprint to 2021-05-12 sprint
Updated by Peter Amstutz over 3 years ago
- Target version changed from 2021-05-12 sprint to 2021-05-26 sprint
Updated by Peter Amstutz over 3 years ago
- Target version changed from 2021-05-26 sprint to 2021-06-09 sprint
Updated by Peter Amstutz over 3 years ago
- Assigned To changed from Nico César to Peter Amstutz
Updated by Peter Amstutz over 3 years ago
- Status changed from New to In Progress
Updated by Peter Amstutz over 3 years ago
- Target version changed from 2021-06-09 sprint to 2021-06-23 sprint
Updated by Peter Amstutz over 3 years ago
- Status changed from In Progress to Resolved