Bug #18290

[LSF] Add "/host" to rusage strings

Added by Tom Clegg about 1 month ago. Updated about 1 month ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
Crunch
Target version:
Start date:
Due date:
% Done:

100%

Estimated time:
Story points:
-
Release relationship:
Auto

Description

"man bsub" does not mention this, but according to the online docs we can use "123MB/host" instead of "123MB" so the site-configurable ReservationUsage value doesn't change how the memory requirement is interpreted.

(User reports this solves their issue where rusage is otherwise interpreted as "per slot".)

Associated revisions

Revision 1f23032c
Added by Ward Vandewege about 1 month ago

Merge branch '18290-lsf-bsub-args-template'

closes #18290

Arvados-DCO-1.1-Signed-off-by: Ward Vandewege <>

History

#1 Updated by Tom Clegg about 1 month ago

  • Status changed from New to In Progress

#2 Updated by Tom Clegg about 1 month ago

  • Description updated (diff)

#3 Updated by Ward Vandewege about 1 month ago

  • Release set to 42

#4 Updated by Ward Vandewege about 1 month ago

The naive approach is at b7b0aeebe1dc768f79229b3854be2a5fa3f39eaa on branch 18290-lsf-rusage-update

tests passed at https://ci.arvados.org/view/Developer/job/developer-run-tests/2740/

#5 Updated by Tom Clegg about 1 month ago

Unfortunately, depending on configuration (or something?) adding "/host" can cause bsub to fail instead of doing what we want.

9tee4$ bsub -n 8 -D '8MB' -R 'rusage[mem=8MB/host] span[hosts=1]' echo ok
Error in rusage section. Job not submitted.

So it seems this does need to be configurable after all.

One approach would be to support interpolation/template vars in the existing BsubArgumentsList config, and get rid of all the other arguments that we currently add automatically.

So instead of

bsub [BsubArgumentsList] -J {uuid} -n {vcpus} -D {mem}MB -R "rusage[mem={mem}MB:tmp={tmp}MB] span[hosts=1]" 

we would run

bsub [BsubArgumentsList]

and the default value for the BsubArgumentsList config would be

["-J", "{uuid}", "-n", "{vcpus}", "-D", "{mem}MB", "-R", "rusage[mem={mem}MB:tmp={tmp}MB] span[hosts=1]"]

Then, the operator can insert the "/host" part where needed.

#6 Updated by Ward Vandewege about 1 month ago

After discussion in chat we decided on

["-o", "/tmp/crunch-run.%%J.out", "-e", "/tmp/crunch-run.%%J.err","-J", "%U", "-n", "%C", "-D", "%MMB", "-R", "rusage[mem=%MMB:tmp=%TMB] span[hosts=1]"]

with %% as an escaped % (because bsub interprets %J) and

%U uuid
%C # cpus
%M memory in MB
%T tmp space in MB

#7 Updated by Ward Vandewege about 1 month ago

Ready for review at 39464fc833e3ee2fb771f83dce9f94e3856c1075 on branch 18290-lsf-bsub-args-template

Tests passed at https://ci.arvados.org/view/Developer/job/developer-run-tests/2742/

#8 Updated by Tom Clegg about 1 month ago

In config.default.yml, "number of cpus" should probably be "number of VCPUs" to agree with the key in runtime constraints and instance types.

In bsubArgs(), tmpArgs is superfluous -- could just as well use for _, a := range disp.Cluster.Containers.LSF.BsubArgumentsList {...}

The substitution code looks like it will work for realistic inputs, but I don't think cases like "%%%C" are handled correctly. Could simplify & fix the edge cases with something like

repl := map[string]string{
  "%%": "%",
  "%C": fmt.Sprintf("%d", vcpus),
  // ...
}
re := regexp.MustCompile("%.")
for _, a := range ... {
  args = append(args, re.ReplaceAllStringFunc(a, func(s string) string { return repl[s] }))
}

This version also replaces "%X" with "", which is (I think?) slightly better than accepting "foo%bar" as "foo%bar" in this version but then causing unintended results during an upgrade to a future version where we give "%b" a meaning. Perhaps worth returning an error instead?

#9 Updated by Ward Vandewege about 1 month ago

Tom Clegg wrote:

In config.default.yml, "number of cpus" should probably be "number of VCPUs" to agree with the key in runtime constraints and instance types.

In bsubArgs(), tmpArgs is superfluous -- could just as well use for _, a := range disp.Cluster.Containers.LSF.BsubArgumentsList {...}

The substitution code looks like it will work for realistic inputs, but I don't think cases like "%%%C" are handled correctly. Could simplify & fix the edge cases with something like

[...]

This version also replaces "%X" with "", which is (I think?) slightly better than accepting "foo%bar" as "foo%bar" in this version but then causing unintended results during an upgrade to a future version where we give "%b" a meaning. Perhaps worth returning an error instead?

Thanks, I implemented all that, and I did add returning an error when an unknown substitution parameter is encountered.

Ready for another look at 22516d3663a3c11384824dad0e052dc0630f08f0 on branch 18290-lsf-bsub-args-template, tests are running at https://ci.arvados.org/view/Developer/job/developer-run-tests/2745/

#10 Updated by Tom Clegg about 1 month ago

LGTM, thanks!

#11 Updated by Ward Vandewege about 1 month ago

  • Assigned To changed from Tom Clegg to Ward Vandewege

Tom Clegg wrote:

LGTM, thanks!

Thanks, tests passed, merged!

#12 Updated by Ward Vandewege about 1 month ago

  • % Done changed from 0 to 100
  • Status changed from In Progress to Resolved

Also available in: Atom PDF