Project

General

Profile

Actions

Bug #20531

closed

When getting logs from child job to print error, distinguish between job couldn't start and non-zero exit

Added by Peter Amstutz 9 months ago. Updated 9 months ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
CWL
Target version:
Start date:
05/30/2023
Due date:
% Done:

100%

Estimated time:
(Total: 0.00 h)
Story points:
0.5
Release relationship:
Auto

Description

  • When the job failed with a non-zero exit code, just print stdout/stderr, no crunch-run
  • When the job failed to start, only print crunch-run (we won't have stdout/stderr anyway)
  • When the job failed from a signal (exit code > 127), print both

Subtasks 1 (0 open1 closed)

Task #20563: Review 20531-cwl-log-tailResolvedBrett Smith05/30/2023

Actions

Related issues

Related to Arvados - Story #20521: Python tools emit logs from googleapiclientNew

Actions
Actions #1

Updated by Peter Amstutz 9 months ago

  • Description updated (diff)
Actions #4

Updated by Peter Amstutz 9 months ago

  • Story points set to 0.5
  • Target version changed from To be groomed to Development 2023-06-07
  • Assigned To set to Peter Amstutz
Actions #5

Updated by Peter Amstutz 9 months ago

  • Status changed from New to In Progress
  • Subject changed from When getting logs from child job to print error, distinguish between job couldn't start and non-zero exit to When getting logs from child job to print error, distinguish between job couldn't start and non-zero exit
Actions #7

Updated by Peter Amstutz 9 months ago

  • Release set to 64
Actions #8

Updated by Brett Smith 9 months ago

Peter Amstutz wrote in #note-6:

20531-cwl-log-tail @ 3c0b0c7d34d015fc6d4838cd7fa00a729c1898bb

The code is fine. Can you please add a lot more comments explaining the issue that's being fixed by the urllib3 requirement? This kind of stuff gets lost to the sands of time really quickly, so it would help future us a lot if we could look at this in a year or five and know whether the issue is still relevant. I took a quick skim of the requirements for both cwl-utils and cwltool and didn't see a direct urllib3 requirement, so while I don't doubt there's a real issue it's not even clear to me right now what it is. Thanks.

Actions #9

Updated by Peter Amstutz 9 months ago

Brett Smith wrote in #note-8:

Peter Amstutz wrote in #note-6:

20531-cwl-log-tail @ 3c0b0c7d34d015fc6d4838cd7fa00a729c1898bb

The code is fine. Can you please add a lot more comments explaining the issue that's being fixed by the urllib3 requirement? This kind of stuff gets lost to the sands of time really quickly, so it would help future us a lot if we could look at this in a year or five and know whether the issue is still relevant. I took a quick skim of the requirements for both cwl-utils and cwltool and didn't see a direct urllib3 requirement, so while I don't doubt there's a real issue it's not even clear to me right now what it is. Thanks.

The basic problem is that installing with a version of pip with the old resolver, common situation is that package X has an open ended dependency on Z, and package Y has a closed dependency on Z, but if package X is installed first, it won't downgrade to the version required by Y, so at runtime you get a dependency conflict between Y and Z.

pip's newer resolver is smarter, but lots of end users are on older distros with older version of pip.

So this was attempting to solve one of those situations by lifting the version constraint to the top-level package.

I think the specific situation involved cwl-utils having a constraint 'urllib3 < 2' but something else installed 'requests' with an open-ended version spec and as a result a version a later version of 'urllib3' was installed first.

Turns out as of this writing this was just fixed in upstream in the last 8 hours:

https://github.com/common-workflow-language/cwl-utils/commit/7af859de43d3983d45d8bb432d7ae6b715265efe

So I think the original change is no longer needed and can be reverted.

Meanwhile, a new version of cwltest was released which breaks compatibility.

https://github.com/common-workflow-language/cwltest/releases/tag/2.3.20230527113600

So I had to put in a version pin to hopefully fix testing, see note.

Actions #11

Updated by Brett Smith 9 months ago

Peter Amstutz wrote in #note-10:

20531-cwl-log-tail @ 72b8fdfb21636030f2e110eb5850beb5f08bb930

Thanks for explaining, that "fixed earlier today" thing helps me feel better about why I couldn't find the connection. What's in the branch looks good to me, but it looks like something might've changed recently to cause that test failure in Jenkins?

Actions #12

Updated by Peter Amstutz 9 months ago

  • Related to Story #20521: Python tools emit logs from googleapiclient added
Actions #13

Updated by Peter Amstutz 9 months ago

20531-cwl-log-tail @ 499aaaa774fc2b52057032456db5256c39b5b8ec

Sorry to pile in more things, but these all fall under the theme of improving the communication of what's an error the user needs to know about, and what isn't.

  • bump cwltool and schema salad versions
  • change googleapiclient log level to ERROR so we don't log retries by default
  • tweak error messages related to file download

developer-run-tests: #3686

arvados-cwl-conformance-tests: #1448

Actions #14

Updated by Peter Amstutz 9 months ago

  • Status changed from In Progress to Resolved
Actions

Also available in: Atom PDF