Project

General

Profile

Actions

Feature #21168

open

Use SLURM REST API

Added by Peter Amstutz about 1 year ago. Updated about 1 year ago.

Status:
New
Priority:
Normal
Assigned To:
-
Category:
Crunch
Target version:
Story points:
-

Description

Carsten Schelp writes:

https://matrix.to/#/!iCGZYZrhxQUZbnMoxM:gitter.im/$h4EY4EmLYUyejj2RfLKxGDtzUVp0W5y20x-ScDn4WFo?via=gitter.im&via=matrix.org&via=surf.nl

Hi all, I think I mentioned earlier that we (surf.nl) would like to see a SLRUM-REST-API interface for Arvados.
This would enable us to use the Snellius supercomputer as a crunch instance.
I am taking steps to find a Go-developer who can modify the existing SLURM dispatcher ( https://github.com/arvados/arvados/tree/main/services/crunch-dispatch-slurm ) such that it can call the SLURM-REST-API instead of issuing s* commands.
This is the docs-page of the SLURM-REST-API: https://slurm.schedmd.com/rest.html
I can get some budget for this, hoping that it covers the required effort.
My estimate is that this is not a very big task, but I may be naïve.
Everybody is welcome to play some scrum-poker, here.
Either way, please speak up if you can support the effort with advice, personnel or budget!

Actions #1

Updated by Peter Amstutz about 1 year ago

  • Description updated (diff)
Actions #2

Updated by Peter Amstutz about 1 year ago

My thoughts:

crunch-dispatch-slurm is not a very big program, so this is not that big of a project, but learning your way around might take a little time. There's a public development channel at https://matrix.to/#/#arvados_development:gitter.im

It isn't quite as simple as just providing at alternate "slurmCLI" object, because the methods that are interacting with it are also passing command line parameters and returning bare results that are parsed out by the caller.

You probably want a thicker interface around the slurm operations that abstracts command line vs REST by taking function parameters or an options struct and then translating that into either command line flags or REST API parameters. Similarly the return values should be abstracted so the parsing happens in the slurm object instead of the caller.

It's useful to understand that what arvados asks slurm to do is run an instance of crunch-run which is the agent that actually manages the lifecycle of the container. You'll need crunch-run installed on your HPC cluster.

You'll also need to add appropriate configuration parameters to the config file (presumably at least an endpoint URL and an API token). See arvados/lib/config and arvados/sdk/go/arvados/config.go

The configuration provides for "SbatchArgumentsList" and "SbatchEnvironmentVariables" which can be used for site-specific configuration. I don't know if you'll want something similar to allow for providing custom options to the REST API.

Ignore the "Managed" section under "SLURM". It's related to a slurm-on-cloud configuration that we don't support any more, and is on its way out.

Actions #3

Updated by Carsten Schelp about 1 year ago

Hi @Peter,

Thank you for the added details.

So you'd suggest to still have one single SLURM client that has two interfacing modes: "scommands" and "REST". Which mode is used depends on (new) configuration in the "DispatchSLURM" section of the arvados config file. If "REST" is configured, also the url to the API must be specified, there.
I still have to see whether a rather static API token in the config is feasible.

About the "thicker" client you mention: Do you mean that functionality from the caller would have to be transferred to the SLURM client to keep the two interfacing modes manageable?
Or is it just about re-arranging the given parameters in a way that will fit into a REST-API call?

Actions #4

Updated by Peter Amstutz about 1 year ago

Carsten Schelp wrote in #note-3:

Hi @Peter,

Thank you for the added details.

So you'd suggest to still have one single SLURM client that has two interfacing modes: "scommands" and "REST". Which mode is used depends on (new) configuration in the "DispatchSLURM" section of the arvados config file. If "REST" is configured, also the url to the API must be specified, there.
I still have to see whether a rather static API token in the config is feasible.

Yes, that sounds right.

About the "thicker" client you mention: Do you mean that functionality from the caller would have to be transferred to the SLURM client to keep the two interfacing modes manageable?

Yes, because the current implementation is passing command line parameters and parsing the results. Presumably the REST API parameters are not exactly the same as the command line options and the return value are not exactly the same as the standard output of the command line.

That said, if the reason you need this is because you can't have the persistent crunch-dispatch-slurm service to run on a node where the sbatch/squeue commands are available, you could add support for running sbatch/squeue over an SSH session. Then you would run crunch-dispatch-slurm outside the HPC environment, but it would use SSH to connect to a node in the HPC environment to run the slurm commands. This would likely be less work than adding support for the REST API (but you may have other reasons for supporting the REST API).

Actions #5

Updated by Carsten Schelp about 1 year ago

Well, indeed - the REST API also gives us more grip on authentication and authorization towards the HPC.
Also from the HPC's side - they prefer a REST API over "something" logging in like a user.
I also just learned that we can have short- and long-lived API tokens issued. This seems rather flexible.

I am thinking about taking a first step by trying to translate SLURM command parameters into API requests and, likewise, translating results back to the existing interface. (The REST-API follows the API of the command line very closely, it seems. So I might get away with this.)
The second step would be refactoring the two SLURM clients into one, moving some logic from the caller into the client.
This way, we can find out early if there are any obstacles for using the SLURM-REST-API at all.

Actions

Also available in: Atom PDF