Project

General

Profile

Actions

Bug #16391

closed

improve container logging when c-d-s has a problem with sbatch execution

Added by Ward Vandewege over 2 years ago. Updated about 2 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
-
Target version:
Start date:
05/13/2020
Due date:
% Done:

100%

Estimated time:
(Total: 0.00 h)
Story points:
-
Release relationship:
Auto

Description

When crunch-dispatch-slurm runs into a problem with sbatch, e.g.

"sbatch: error: Batch job submission failed: Requested node configuration is not available"

that error is visible in c-d-s' logs. But in the container logs we print:

error submitting container to slurm: "PANIC=runtime error: invalid memory address or nil pointer dereference"

We need to bubble up the real error message.

This was observed on 1.4.3, presumably the error still exists on 2.0.2, but I haven't checked that.


Subtasks 1 (0 open1 closed)

Task #16411: Review 16391-sbatch-error-loggingResolvedTom Clegg05/13/2020

Actions

Related issues

Related to Arvados - Bug #13581: Report node not satisfiable users to the userResolvedLucas Di Pentima06/07/2018

Actions
Actions #1

Updated by Ward Vandewege over 2 years ago

  • Description updated (diff)
Actions #2

Updated by Ward Vandewege over 2 years ago

  • Subject changed from improve container logging when c-d-s has a problem with execution sbatch to improve container logging when c-d-s has a problem with sbatch execution
Actions #3

Updated by Ward Vandewege over 2 years ago

  • Description updated (diff)
Actions #4

Updated by Ward Vandewege over 2 years ago

  • Target version changed from To Be Groomed to 2020-05-20 Sprint
Actions #5

Updated by Tom Clegg over 2 years ago

  • Assigned To set to Tom Clegg
Actions #6

Updated by Tom Clegg over 2 years ago

  • Related to Bug #13581: Report node not satisfiable users to the user added
Actions #7

Updated by Tom Clegg over 2 years ago

Shadowed variable bug.

16391-sbatch-error-logging @ 79d0b2480c207c12fd229403384881cd53a234e8 -- developer-run-tests: #1854

Actions #8

Updated by Lucas Di Pentima over 2 years ago

Isn't this update also shadowing the variable? Shouldn't be just "switch err.(type) {"?

diff --git a/services/crunch-dispatch-slurm/crunch-dispatch-slurm.go b/services/crunch-dispatch-slurm/crunch-dispatch-slurm.go
index 36bcef4f2..4115482d8 100644
--- a/services/crunch-dispatch-slurm/crunch-dispatch-slurm.go
+++ b/services/crunch-dispatch-slurm/crunch-dispatch-slurm.go
@@ -280,7 +280,8 @@ func (disp *Dispatcher) runContainer(_ *dispatch.Dispatcher, ctr arvados.Contain
                cmd = append(cmd, disp.cluster.Containers.CrunchRunArgumentsList...)
                if err := disp.submit(ctr, cmd); err != nil {
                        var text string
-                       if err, ok := err.(dispatchcloud.ConstraintsNotSatisfiableError); ok {
+                       switch err := err.(type) {
+                       case dispatchcloud.ConstraintsNotSatisfiableError:
                                var logBuf bytes.Buffer
                                fmt.Fprintf(&logBuf, "cannot run container %s: %s\n", ctr.UUID, err)
                                if len(err.AvailableTypes) == 0 {
@@ -296,7 +297,7 @@ func (disp *Dispatcher) runContainer(_ *dispatch.Dispatcher, ctr arvados.Contain
                                }
                                text = logBuf.String()
                                disp.UpdateState(ctr.UUID, dispatch.Cancelled)
-                       } else {
+                       default:
                                text = fmt.Sprintf("Error submitting container %s to slurm: %s", ctr.UUID, err)
                        }
                        log.Print(text)
Actions #9

Updated by Tom Clegg over 2 years ago

Isn't this update also shadowing the variable? Shouldn't be just "switch err.(type) {"?

Yes and no: err is shadowed by a copy of the original err. switch err.(type) wouldn't shadow err with the specific-type value, so err.AvailableTypes wouldn't compile.

(In the previous code, the type assertion would return either err,true or nil,false, so the "else" case would always have err==nil instead of [a copy of] the original err.)

https://tour.golang.org/methods/16 https://play.golang.org/p/P88oNG4IP9u

Actions #10

Updated by Lucas Di Pentima over 2 years ago

Thanks for the example, this LGTM!

Actions #11

Updated by Anonymous over 2 years ago

  • % Done changed from 0 to 100
  • Status changed from New to Resolved
Actions #12

Updated by Peter Amstutz about 2 years ago

  • Release set to 33
Actions

Also available in: Atom PDF