Bug #16391

improve container logging when c-d-s has a problem with sbatch execution

Added by Ward Vandewege 2 months ago. Updated about 1 month ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
-
Target version:
Start date:
05/13/2020
Due date:
% Done:

100%

Estimated time:
(Total: 0.00 h)
Story points:
-
Release relationship:
Auto

Description

When crunch-dispatch-slurm runs into a problem with sbatch, e.g.

"sbatch: error: Batch job submission failed: Requested node configuration is not available"

that error is visible in c-d-s' logs. But in the container logs we print:

error submitting container to slurm: "PANIC=runtime error: invalid memory address or nil pointer dereference"

We need to bubble up the real error message.

This was observed on 1.4.3, presumably the error still exists on 2.0.2, but I haven't checked that.


Subtasks

Task #16411: Review 16391-sbatch-error-loggingResolvedTom Clegg


Related issues

Related to Arvados - Bug #13581: Report node not satisfiable users to the userResolved06/07/2018

Associated revisions

Revision 0e802921
Added by Tom Clegg about 2 months ago

Merge branch '16391-sbatch-error-logging'

fixes #16391

Arvados-DCO-1.1-Signed-off-by: Tom Clegg <>

History

#1 Updated by Ward Vandewege 2 months ago

  • Description updated (diff)

#2 Updated by Ward Vandewege 2 months ago

  • Subject changed from improve container logging when c-d-s has a problem with execution sbatch to improve container logging when c-d-s has a problem with sbatch execution

#3 Updated by Ward Vandewege 2 months ago

  • Description updated (diff)

#4 Updated by Ward Vandewege 2 months ago

  • Target version changed from To Be Groomed to 2020-05-20 Sprint

#5 Updated by Tom Clegg about 2 months ago

  • Assigned To set to Tom Clegg

#6 Updated by Tom Clegg about 2 months ago

  • Related to Bug #13581: Report node not satisfiable users to the user added

#8 Updated by Lucas Di Pentima about 2 months ago

Isn't this update also shadowing the variable? Shouldn't be just "switch err.(type) {"?

diff --git a/services/crunch-dispatch-slurm/crunch-dispatch-slurm.go b/services/crunch-dispatch-slurm/crunch-dispatch-slurm.go
index 36bcef4f2..4115482d8 100644
--- a/services/crunch-dispatch-slurm/crunch-dispatch-slurm.go
+++ b/services/crunch-dispatch-slurm/crunch-dispatch-slurm.go
@@ -280,7 +280,8 @@ func (disp *Dispatcher) runContainer(_ *dispatch.Dispatcher, ctr arvados.Contain
                cmd = append(cmd, disp.cluster.Containers.CrunchRunArgumentsList...)
                if err := disp.submit(ctr, cmd); err != nil {
                        var text string
-                       if err, ok := err.(dispatchcloud.ConstraintsNotSatisfiableError); ok {
+                       switch err := err.(type) {
+                       case dispatchcloud.ConstraintsNotSatisfiableError:
                                var logBuf bytes.Buffer
                                fmt.Fprintf(&logBuf, "cannot run container %s: %s\n", ctr.UUID, err)
                                if len(err.AvailableTypes) == 0 {
@@ -296,7 +297,7 @@ func (disp *Dispatcher) runContainer(_ *dispatch.Dispatcher, ctr arvados.Contain
                                }
                                text = logBuf.String()
                                disp.UpdateState(ctr.UUID, dispatch.Cancelled)
-                       } else {
+                       default:
                                text = fmt.Sprintf("Error submitting container %s to slurm: %s", ctr.UUID, err)
                        }
                        log.Print(text)

#9 Updated by Tom Clegg about 2 months ago

Isn't this update also shadowing the variable? Shouldn't be just "switch err.(type) {"?

Yes and no: err is shadowed by a copy of the original err. switch err.(type) wouldn't shadow err with the specific-type value, so err.AvailableTypes wouldn't compile.

(In the previous code, the type assertion would return either err,true or nil,false, so the "else" case would always have err==nil instead of [a copy of] the original err.)

https://tour.golang.org/methods/16 https://play.golang.org/p/P88oNG4IP9u

#10 Updated by Lucas Di Pentima about 2 months ago

Thanks for the example, this LGTM!

#11 Updated by Anonymous about 2 months ago

  • % Done changed from 0 to 100
  • Status changed from New to Resolved

#12 Updated by Peter Amstutz about 1 month ago

  • Release set to 33

Also available in: Atom PDF