Story #9581

[Crunch2] SLURM dispatcher supports arbitrary arguments to sbatch

Added by Brett Smith almost 5 years ago. Updated almost 5 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Radhika Chippada
Category:
Crunch
Target version:
Start date:
07/12/2016
Due date:
% Done:

100%

Estimated time:
(Total: 0.00 h)
Story points:
1.0
Release:
Release relationship:
Auto

Description

Use case

Administrators want additional control over how Arvados containers are dispatched to SLURM. Some options they might commonly want to specify are --account, --clusters, or --partition. crunch-dispatch-slurm should provide a simple mechanism for administrators to set arbitrary additional arguments to sbatch.

Implementation

Add a command line flag, "-config path/to/config.json" (see source:services/keep-balance/main.go for an example). crunch-dispatch-slurm reads configuration entries from this JSON file. If no -config switch is given, it tries to read configuration from /etc/arvados/crunch-dispatch-slurm/config.json. If that file does not exist, crunch-dispatch-slurm continues running with default configuration. If there's any other error reading the file (permissions problem, invalid JSON), or if the user specified a config file that doesn't exist, that's a fatal error, and the program should abort with a useful error message.

Convert existing command line flags to config file entries.

type Config struct {
        SbatchArguments  []string
        PollPeriod       time.Duration
        CrunchRunCommand string
}

The configured SbatchArguments should be inserted into the command passed to exec.Command(), after the generic options sbatch --share and before the job-specific options --job-name, --mem-per-cpu, and --cpus-per-task. This is because the administrator's configuration should take precedence over default behavior switches, but not over job-specific switches which have more details about how the container should be run.

Tests

Just unit tests for sbatchFunc.
  • Handles nil SbatchArguments correctly
  • Handles non-nil SbatchArguments correctly

Subtasks

Task #9640: Review branch 9581-slurm-dispatcher-configResolvedRadhika Chippada

Associated revisions

Revision 83a4494e
Added by Radhika Chippada almost 5 years ago

closes #9581
Merge branch '9581-slurm-dispatcher-config'

History

#1 Updated by Tom Clegg almost 5 years ago

  • Category set to Crunch
  • Assigned To set to Tom Clegg

#2 Updated by Tom Clegg almost 5 years ago

  • Description updated (diff)

#3 Updated by Brett Smith almost 5 years ago

Requiring -config to be specified means we don't get some of the potential benefits of a configuration file, like "systemd unit file is usable out of the box." I'm fine with having the flag, but can we agree on a default location for the configuration file? Ops has said before they would like to standardize on /etc/arvados/<component>/<file>, and that sounds good to me. So maybe /etc/arvados/crunch-dispatch-slurm/config.json?

#4 Updated by Brett Smith almost 5 years ago

  • Target version set to Arvados Future Sprints

#5 Updated by Brett Smith almost 5 years ago

  • Description updated (diff)

Tom signed off on the default path.

#6 Updated by Brett Smith almost 5 years ago

  • Story points set to 1.0

#7 Updated by Brett Smith almost 5 years ago

  • Description updated (diff)

#8 Updated by Brett Smith almost 5 years ago

  • Description updated (diff)

#9 Updated by Radhika Chippada almost 5 years ago

  • Assigned To changed from Tom Clegg to Radhika Chippada

#10 Updated by Radhika Chippada almost 5 years ago

  • Target version changed from Arvados Future Sprints to 2016-08-03 sprint

#11 Updated by Radhika Chippada almost 5 years ago

  • Status changed from New to In Progress

#12 Updated by Tom Clegg almost 5 years ago

Thanks for de-duplicating code, but I do wonder whether "come up with a path that doesn't exist" really merits this amount of code and filesystem operations. Couldn't we just say const nonexistentFilePath = "/nonexistent/91Gy0nt2AJDZoFwj"?

Suggest:

-       for i := range config.SbatchArguments {
-               sbatchArgs = append(sbatchArgs, config.SbatchArguments[i])
-       }
+       sbatchArgs = append(sbatchArgs, config.SbatchArguments...)

Should check path == "" and return nil before trying to read the file, not after that fails. Or better yet, since path would only be "" if someone explicitly set it that way on the command line, there doesn't seem to be a need to treat it specially: if someone wants to read an empty config file, they should say "-config /dev/null", not "-config ''".

Instead of strings.Contains(err.Error(), "no such file"), use os.IsNotExist(err).

The config file should be loaded into a Config, not a []string. That way the existing command line flags will also become configurable via JSON.

doMain() has to pass a pointer to readConfig(). (Test_ReadConfig fails if you change it to call readConfig the same way doMain does.)

Ideally config should not be a global. Un-globalizing it might involve something like wrapping run() in a closure before putting it in dispatch.Dispatcher. OK to punt this since it's no worse than before.

dispatch.Dispatcher's PollInterval and StartMonitor's poll argument should be just *config.PollPeriod -- shouldn't be multiplied by time.Second.

Test functions should be called TestFoo, not Test_Foo.

Suggest using DeepEquals to test loading:

-       c.Check(3, Equals, len(config.SbatchArguments))
-       for i := range args {
-               c.Check(args[i], Equals, config.SbatchArguments[i])
-       }
+       c.Check(args, DeepEquals, config.SbatchArguments)

#13 Updated by Radhika Chippada almost 5 years ago

Thanks for de-duplicating code, but I do wonder whether "come up with a path that doesn't exist" really merits this amount of code and filesystem operations. Couldn't we just say const nonexistentFilePath = "/nonexistent/91Gy0nt2AJDZoFwj"?

Yes, I never liked this idea myself. I changed slurm dispatcher per your suggestion but left the datamanager as is for now.

Suggest: sbatchArgs = append(sbatchArgs, config.SbatchArguments...)

Updated.

Should check path == "" and return nil before trying to read the file, not after that fails. Or better yet, since path would only be "" if someone explicitly set it that way on the command line, there doesn't seem to be a need to treat it specially: if someone wants to read an empty config file, they should say "-config /dev/null", not "-config ''".

Agree. Removed this extra handling and the corresponding test.

Instead of strings.Contains(err.Error(), "no such file"), use os.IsNotExist(err).

Updated

The config file should be loaded into a Config, not a []string. That way the existing command line flags will also become configurable via JSON.

Done

doMain() has to pass a pointer to readConfig(). (Test_ReadConfig fails if you change it to call readConfig the same way doMain does.)

Good catch. Thanks.

Ideally config should not be a global. Un-globalizing it might involve something like wrapping run() in a closure before putting it in dispatch.Dispatcher. OK to punt this since it's no worse than before.

Punted

dispatch.Dispatcher's PollInterval and StartMonitor's poll argument should be just *config.PollPeriod -- shouldn't be multiplied by time.Second.

Updated

Test functions should be called TestFoo, not Test_Foo.

Updated

Suggest using DeepEquals to test loading: c.Check(args, DeepEquals, config.SbatchArguments)

Done

#14 Updated by Tom Clegg almost 5 years ago

We don't yet have the tests asked for in the issue description -- it looks like we're testing Go's JSON decoder, but we're not testing the actual feature at all. sbatchFunc looks amenable to unit-testing...

Other than that, LGTM.

#15 Updated by Tom Clegg almost 5 years ago

Suggest writing down the right answer instead of copying the code that's being tested.

-       expected = append(expected, fmt.Sprintf("--job-name=%s", container.UUID))
-       memPerCPU := math.Ceil(float64(container.RuntimeConstraints.RAM) / (float64(container.RuntimeConstraints.VCPUs) * 1048576))
-       expected = append(expected, fmt.Sprintf("--mem-per-cpu=%d", int(memPerCPU)))
+       expected = append(expected, "--job-name=123", "--mem-per-cpu=1")

Might as well check testSbatchFuncWithArgs(c, nil) as well.

LGTM

#16 Updated by Radhika Chippada almost 5 years ago

  • Status changed from In Progress to Resolved

Applied in changeset arvados|commit:83a4494e66f4f7447091779f25e6f202b2379de9.

Also available in: Atom PDF