Project

General

Profile

Actions

Bug #18290

closed

[LSF] Add "/host" to rusage strings

Added by Tom Clegg over 2 years ago. Updated over 2 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
Crunch
Target version:
Story points:
-
Release relationship:
Auto

Description

"man bsub" does not mention this, but according to the online docs we can use "123MB/host" instead of "123MB" so the site-configurable ReservationUsage value doesn't change how the memory requirement is interpreted.

(User reports this solves their issue where rusage is otherwise interpreted as "per slot".)

Actions #1

Updated by Tom Clegg over 2 years ago

  • Status changed from New to In Progress
Actions #2

Updated by Tom Clegg over 2 years ago

  • Description updated (diff)
Actions #3

Updated by Ward Vandewege over 2 years ago

  • Release set to 42
Actions #4

Updated by Ward Vandewege over 2 years ago

The naive approach is at b7b0aeebe1dc768f79229b3854be2a5fa3f39eaa on branch 18290-lsf-rusage-update

tests passed at developer-run-tests: #2740

Actions #5

Updated by Tom Clegg over 2 years ago

Unfortunately, depending on configuration (or something?) adding "/host" can cause bsub to fail instead of doing what we want.

9tee4$ bsub -n 8 -D '8MB' -R 'rusage[mem=8MB/host] span[hosts=1]' echo ok
Error in rusage section. Job not submitted.

So it seems this does need to be configurable after all.

One approach would be to support interpolation/template vars in the existing BsubArgumentsList config, and get rid of all the other arguments that we currently add automatically.

So instead of

bsub [BsubArgumentsList] -J {uuid} -n {vcpus} -D {mem}MB -R "rusage[mem={mem}MB:tmp={tmp}MB] span[hosts=1]" 

we would run

bsub [BsubArgumentsList]

and the default value for the BsubArgumentsList config would be

["-J", "{uuid}", "-n", "{vcpus}", "-D", "{mem}MB", "-R", "rusage[mem={mem}MB:tmp={tmp}MB] span[hosts=1]"]

Then, the operator can insert the "/host" part where needed.

Actions #6

Updated by Ward Vandewege over 2 years ago

After discussion in chat we decided on

["-o", "/tmp/crunch-run.%%J.out", "-e", "/tmp/crunch-run.%%J.err","-J", "%U", "-n", "%C", "-D", "%MMB", "-R", "rusage[mem=%MMB:tmp=%TMB] span[hosts=1]"]

with %% as an escaped % (because bsub interprets %J) and

%U uuid
%C # cpus
%M memory in MB
%T tmp space in MB
Actions #7

Updated by Ward Vandewege over 2 years ago

Ready for review at 39464fc833e3ee2fb771f83dce9f94e3856c1075 on branch 18290-lsf-bsub-args-template

Tests passed at developer-run-tests: #2742

Actions #8

Updated by Tom Clegg over 2 years ago

In config.default.yml, "number of cpus" should probably be "number of VCPUs" to agree with the key in runtime constraints and instance types.

In bsubArgs(), tmpArgs is superfluous -- could just as well use for _, a := range disp.Cluster.Containers.LSF.BsubArgumentsList {...}

The substitution code looks like it will work for realistic inputs, but I don't think cases like "%%%C" are handled correctly. Could simplify & fix the edge cases with something like

repl := map[string]string{
  "%%": "%",
  "%C": fmt.Sprintf("%d", vcpus),
  // ...
}
re := regexp.MustCompile("%.")
for _, a := range ... {
  args = append(args, re.ReplaceAllStringFunc(a, func(s string) string { return repl[s] }))
}

This version also replaces "%X" with "", which is (I think?) slightly better than accepting "foo%bar" as "foo%bar" in this version but then causing unintended results during an upgrade to a future version where we give "%b" a meaning. Perhaps worth returning an error instead?

Actions #9

Updated by Ward Vandewege over 2 years ago

Tom Clegg wrote:

In config.default.yml, "number of cpus" should probably be "number of VCPUs" to agree with the key in runtime constraints and instance types.

In bsubArgs(), tmpArgs is superfluous -- could just as well use for _, a := range disp.Cluster.Containers.LSF.BsubArgumentsList {...}

The substitution code looks like it will work for realistic inputs, but I don't think cases like "%%%C" are handled correctly. Could simplify & fix the edge cases with something like

[...]

This version also replaces "%X" with "", which is (I think?) slightly better than accepting "foo%bar" as "foo%bar" in this version but then causing unintended results during an upgrade to a future version where we give "%b" a meaning. Perhaps worth returning an error instead?

Thanks, I implemented all that, and I did add returning an error when an unknown substitution parameter is encountered.

Ready for another look at 22516d3663a3c11384824dad0e052dc0630f08f0 on branch 18290-lsf-bsub-args-template, tests are running at developer-run-tests: #2745

Actions #10

Updated by Tom Clegg over 2 years ago

LGTM, thanks!

Actions #11

Updated by Ward Vandewege over 2 years ago

  • Assigned To changed from Tom Clegg to Ward Vandewege

Tom Clegg wrote:

LGTM, thanks!

Thanks, tests passed, merged!

Actions #12

Updated by Ward Vandewege over 2 years ago

  • % Done changed from 0 to 100
  • Status changed from In Progress to Resolved
Actions

Also available in: Atom PDF