Bug #18290
closed[LSF] Add "/host" to rusage strings
Description
"man bsub" does not mention this, but according to the online docs we can use "123MB/host" instead of "123MB" so the site-configurable ReservationUsage value doesn't change how the memory requirement is interpreted.
(User reports this solves their issue where rusage is otherwise interpreted as "per slot".)
Updated by Ward Vandewege almost 3 years ago
The naive approach is at b7b0aeebe1dc768f79229b3854be2a5fa3f39eaa on branch 18290-lsf-rusage-update
tests passed at developer-run-tests: #2740
Updated by Tom Clegg almost 3 years ago
Unfortunately, depending on configuration (or something?) adding "/host" can cause bsub to fail instead of doing what we want.
9tee4$ bsub -n 8 -D '8MB' -R 'rusage[mem=8MB/host] span[hosts=1]' echo ok Error in rusage section. Job not submitted.
So it seems this does need to be configurable after all.
One approach would be to support interpolation/template vars in the existing BsubArgumentsList config, and get rid of all the other arguments that we currently add automatically.
So instead of
bsub [BsubArgumentsList] -J {uuid} -n {vcpus} -D {mem}MB -R "rusage[mem={mem}MB:tmp={tmp}MB] span[hosts=1]"
we would run
bsub [BsubArgumentsList]
and the default value for the BsubArgumentsList config would be
["-J", "{uuid}", "-n", "{vcpus}", "-D", "{mem}MB", "-R", "rusage[mem={mem}MB:tmp={tmp}MB] span[hosts=1]"]
Then, the operator can insert the "/host" part where needed.
Updated by Ward Vandewege almost 3 years ago
After discussion in chat we decided on
["-o", "/tmp/crunch-run.%%J.out", "-e", "/tmp/crunch-run.%%J.err","-J", "%U", "-n", "%C", "-D", "%MMB", "-R", "rusage[mem=%MMB:tmp=%TMB] span[hosts=1]"]
with %% as an escaped % (because bsub interprets %J) and
%U | uuid |
%C | # cpus |
%M | memory in MB |
%T | tmp space in MB |
Updated by Ward Vandewege almost 3 years ago
Ready for review at 39464fc833e3ee2fb771f83dce9f94e3856c1075 on branch 18290-lsf-bsub-args-template
Tests passed at developer-run-tests: #2742
Updated by Tom Clegg almost 3 years ago
In config.default.yml, "number of cpus" should probably be "number of VCPUs" to agree with the key in runtime constraints and instance types.
In bsubArgs(), tmpArgs
is superfluous -- could just as well use for _, a := range disp.Cluster.Containers.LSF.BsubArgumentsList {...}
The substitution code looks like it will work for realistic inputs, but I don't think cases like "%%%C" are handled correctly. Could simplify & fix the edge cases with something like
repl := map[string]string{ "%%": "%", "%C": fmt.Sprintf("%d", vcpus), // ... } re := regexp.MustCompile("%.") for _, a := range ... { args = append(args, re.ReplaceAllStringFunc(a, func(s string) string { return repl[s] })) }
This version also replaces "%X" with "", which is (I think?) slightly better than accepting "foo%bar" as "foo%bar" in this version but then causing unintended results during an upgrade to a future version where we give "%b" a meaning. Perhaps worth returning an error instead?
Updated by Ward Vandewege almost 3 years ago
Tom Clegg wrote:
In config.default.yml, "number of cpus" should probably be "number of VCPUs" to agree with the key in runtime constraints and instance types.
In bsubArgs(),
tmpArgs
is superfluous -- could just as well usefor _, a := range disp.Cluster.Containers.LSF.BsubArgumentsList {...}
The substitution code looks like it will work for realistic inputs, but I don't think cases like "%%%C" are handled correctly. Could simplify & fix the edge cases with something like
[...]
This version also replaces "%X" with "", which is (I think?) slightly better than accepting "foo%bar" as "foo%bar" in this version but then causing unintended results during an upgrade to a future version where we give "%b" a meaning. Perhaps worth returning an error instead?
Thanks, I implemented all that, and I did add returning an error when an unknown substitution parameter is encountered.
Ready for another look at 22516d3663a3c11384824dad0e052dc0630f08f0 on branch 18290-lsf-bsub-args-template, tests are running at developer-run-tests: #2745
Updated by Ward Vandewege almost 3 years ago
- Assigned To changed from Tom Clegg to Ward Vandewege
Tom Clegg wrote:
LGTM, thanks!
Thanks, tests passed, merged!
Updated by Ward Vandewege almost 3 years ago
- % Done changed from 0 to 100
- Status changed from In Progress to Resolved
Applied in changeset arvados|1f23032c8a10c03e9f4ff00b45576ad5c5e5afd9.