Idea #9581
closed[Crunch2] SLURM dispatcher supports arbitrary arguments to sbatch
Description
Use case¶
Administrators want additional control over how Arvados containers are dispatched to SLURM. Some options they might commonly want to specify are --account
, --clusters
, or --partition
. crunch-dispatch-slurm should provide a simple mechanism for administrators to set arbitrary additional arguments to sbatch.
Implementation¶
Add a command line flag, "-config path/to/config.json" (see source:services/keep-balance/main.go for an example). crunch-dispatch-slurm reads configuration entries from this JSON file. If no -config
switch is given, it tries to read configuration from /etc/arvados/crunch-dispatch-slurm/config.json
. If that file does not exist, crunch-dispatch-slurm continues running with default configuration. If there's any other error reading the file (permissions problem, invalid JSON), or if the user specified a config file that doesn't exist, that's a fatal error, and the program should abort with a useful error message.
Convert existing command line flags to config file entries.
type Config struct { SbatchArguments []string PollPeriod time.Duration CrunchRunCommand string }
The configured SbatchArguments should be inserted into the command passed to exec.Command()
, after the generic options sbatch --share
and before the job-specific options --job-name
, --mem-per-cpu
, and --cpus-per-task
. This is because the administrator's configuration should take precedence over default behavior switches, but not over job-specific switches which have more details about how the container should be run.
Tests¶
Just unit tests for sbatchFunc.- Handles nil SbatchArguments correctly
- Handles non-nil SbatchArguments correctly
Updated by Tom Clegg over 8 years ago
- Category set to Crunch
- Assigned To set to Tom Clegg
Updated by Brett Smith over 8 years ago
Requiring -config
to be specified means we don't get some of the potential benefits of a configuration file, like "systemd unit file is usable out of the box." I'm fine with having the flag, but can we agree on a default location for the configuration file? Ops has said before they would like to standardize on /etc/arvados/<component>/<file>
, and that sounds good to me. So maybe /etc/arvados/crunch-dispatch-slurm/config.json
?
Updated by Brett Smith over 8 years ago
- Target version set to Arvados Future Sprints
Updated by Brett Smith over 8 years ago
- Description updated (diff)
Tom signed off on the default path.
Updated by Radhika Chippada over 8 years ago
- Assigned To changed from Tom Clegg to Radhika Chippada
Updated by Radhika Chippada over 8 years ago
- Target version changed from Arvados Future Sprints to 2016-08-03 sprint
Updated by Radhika Chippada over 8 years ago
- Status changed from New to In Progress
Updated by Tom Clegg over 8 years ago
Thanks for de-duplicating code, but I do wonder whether "come up with a path that doesn't exist" really merits this amount of code and filesystem operations. Couldn't we just say const nonexistentFilePath = "/nonexistent/91Gy0nt2AJDZoFwj"
?
Suggest:
- for i := range config.SbatchArguments {
- sbatchArgs = append(sbatchArgs, config.SbatchArguments[i])
- }
+ sbatchArgs = append(sbatchArgs, config.SbatchArguments...)
Should check path == ""
and return nil before trying to read the file, not after that fails. Or better yet, since path would only be "" if someone explicitly set it that way on the command line, there doesn't seem to be a need to treat it specially: if someone wants to read an empty config file, they should say "-config /dev/null", not "-config ''".
Instead of strings.Contains(err.Error(), "no such file")
, use os.IsNotExist(err)
.
The config file should be loaded into a Config, not a []string. That way the existing command line flags will also become configurable via JSON.
doMain() has to pass a pointer to readConfig(). (Test_ReadConfig fails if you change it to call readConfig the same way doMain does.)
Ideally config should not be a global. Un-globalizing it might involve something like wrapping run() in a closure before putting it in dispatch.Dispatcher. OK to punt this since it's no worse than before.
dispatch.Dispatcher's PollInterval and StartMonitor's poll argument should be just *config.PollPeriod -- shouldn't be multiplied by time.Second.
Test functions should be called TestFoo, not Test_Foo.
Suggest using DeepEquals to test loading:
- c.Check(3, Equals, len(config.SbatchArguments))
- for i := range args {
- c.Check(args[i], Equals, config.SbatchArguments[i])
- }
+ c.Check(args, DeepEquals, config.SbatchArguments)
Updated by Radhika Chippada over 8 years ago
Thanks for de-duplicating code, but I do wonder whether "come up with a path that doesn't exist" really merits this amount of code and filesystem operations. Couldn't we just say const nonexistentFilePath = "/nonexistent/91Gy0nt2AJDZoFwj"?
Yes, I never liked this idea myself. I changed slurm dispatcher per your suggestion but left the datamanager as is for now.
Suggest: sbatchArgs = append(sbatchArgs, config.SbatchArguments...)
Updated.
Should check path == "" and return nil before trying to read the file, not after that fails. Or better yet, since path would only be "" if someone explicitly set it that way on the command line, there doesn't seem to be a need to treat it specially: if someone wants to read an empty config file, they should say "-config /dev/null", not "-config ''".
Agree. Removed this extra handling and the corresponding test.
Instead of strings.Contains(err.Error(), "no such file"), use os.IsNotExist(err).
Updated
The config file should be loaded into a Config, not a []string. That way the existing command line flags will also become configurable via JSON.
Done
doMain() has to pass a pointer to readConfig(). (Test_ReadConfig fails if you change it to call readConfig the same way doMain does.)
Good catch. Thanks.
Ideally config should not be a global. Un-globalizing it might involve something like wrapping run() in a closure before putting it in dispatch.Dispatcher. OK to punt this since it's no worse than before.
Punted
dispatch.Dispatcher's PollInterval and StartMonitor's poll argument should be just *config.PollPeriod -- shouldn't be multiplied by time.Second.
Updated
Test functions should be called TestFoo, not Test_Foo.
Updated
Suggest using DeepEquals to test loading: c.Check(args, DeepEquals, config.SbatchArguments)
Done
Updated by Tom Clegg over 8 years ago
We don't yet have the tests asked for in the issue description -- it looks like we're testing Go's JSON decoder, but we're not testing the actual feature at all. sbatchFunc looks amenable to unit-testing...
Other than that, LGTM.
Updated by Tom Clegg over 8 years ago
Suggest writing down the right answer instead of copying the code that's being tested.
- expected = append(expected, fmt.Sprintf("--job-name=%s", container.UUID))
- memPerCPU := math.Ceil(float64(container.RuntimeConstraints.RAM) / (float64(container.RuntimeConstraints.VCPUs) * 1048576))
- expected = append(expected, fmt.Sprintf("--mem-per-cpu=%d", int(memPerCPU)))
+ expected = append(expected, "--job-name=123", "--mem-per-cpu=1")
Might as well check testSbatchFuncWithArgs(c, nil) as well.
LGTM
Updated by Radhika Chippada over 8 years ago
- Status changed from In Progress to Resolved
Applied in changeset arvados|commit:83a4494e66f4f7447091779f25e6f202b2379de9.