Story #14870

[API] Access logs from previous attempts after auto-retrying a container request

Added by Tom Clegg 4 months ago. Updated 3 months ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
API
Target version:
Start date:
03/01/2019
Due date:
% Done:

100%

Estimated time:
(Total: 0.00 h)
Story points:
2.0
Release relationship:
Auto

Description

Preserve all relevant logs in the container request's log collection, even if they span multiple containers.

Instead of just replacing the CR's entire log collection when the container's log is updated:
  • Copy the container's log files into a "container ${uuid}" subdir in the container request's log collection.
  • Leave any existing "container ${uuid}" subdirs alone.
  • Also put a copy of the latest container's logs in the root dir of the container request's log collection. This way, existing scripts continue to work on new logs.

(Aside: This also helps in the case where the container record itself is really what's wanted, since that is included in the container's log collection. There are currently some exceptions -- e.g., a log collection isn't created at all when a container doesn't fit any instance type -- but those could be fixed.)


Subtasks

Task #14894: Review 14870-retry-logsResolvedPeter Amstutz

Task #14908: Review 14870-ruby-sdk-cp-rResolvedLucas Di Pentima


Related issues

Related to Arvados - Feature #14706: [Crunch2] Retain references + permissions to earlier containers when retrying a container requestNew

Associated revisions

Revision fd86f7f4
Added by Peter Amstutz 4 months ago

Merge branch '14870-ruby-sdk-cp-r' refs #14870

Arvados-DCO-1.1-Signed-off-by: Peter Amstutz <>

Revision 2944dd6d
Added by Peter Amstutz 3 months ago

Merge branch '14870-retry-logs' refs #14870

Arvados-DCO-1.1-Signed-off-by: Peter Amstutz <>

Revision 233e184e (diff)
Added by Peter Amstutz 3 months ago

Fix a-c-r printing logs on error refs #14870

Arvados-DCO-1.1-Signed-off-by: Peter Amstutz <>

History

#1 Updated by Tom Clegg 4 months ago

  • Related to Feature #14706: [Crunch2] Retain references + permissions to earlier containers when retrying a container request added

#2 Updated by Tom Morris 4 months ago

  • Target version changed from To Be Groomed to 2019-02-27 Sprint
  • Story points set to 2.0

#3 Updated by Tom Morris 4 months ago

  • Target version changed from 2019-02-27 Sprint to Arvados Future Sprints

#4 Updated by Tom Morris 4 months ago

  • Target version changed from Arvados Future Sprints to 2019-03-13 Sprint

#5 Updated by Peter Amstutz 4 months ago

  • Assigned To set to Peter Amstutz

#6 Updated by Peter Amstutz 4 months ago

  • Status changed from New to In Progress

#7 Updated by Tom Morris 4 months ago

  • Release set to 15

#8 Updated by Peter Amstutz 4 months ago

14870-ruby-sdk-cp-r @ 338ab239adbc259d5cd070158b4e571925b9f81b

The gist is that the ruby sdk seems to have a long standing bug where you can't copy into "." of an empty collection. It follows a different code path from the case where you are copying into a collection that already has something in it, as a result the existing test case "test_copy_root_contents_across_collections" didn't catch it.

#9 Updated by Lucas Di Pentima 4 months ago

As previously said on chat, 14870-ruby-sdk-cp-r LGTM. Thanks!

#10 Updated by Peter Amstutz 3 months ago

14870-retry-logs @ 6a240180171525077bc9e64e903b0122d5d5f1b4

https://ci.curoverse.com/view/Developer/job/developer-run-tests/1097/

  • Logs for each container copied into subdirectory "container log for [uuid]". The most recent logs are also copied into the root of the collection when the container is finalized to minimize breaking existing code.
  • Update tests
I'm open to changing the exact name of the subdirectory. Some other possibilities are
  • just the uuid (no extra text)
  • "log for attempt [uuid]"
  • "failed attempt [uuid]"

#11 Updated by Peter Amstutz 3 months ago

Updating the Arvados Ruby SDK dependency creates an incidental problem, #14482 tightens up manifest handling but did not update the API server dependency. That's now causing problems:

ERROR: runTest (tests.test_mount.FuseMountTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/ci-jenkins/.jenkins-slave/workspace/developer-run-tests-services-fuse/services/fuse/tests/test_mount.py", line 91, in setUp
    self.api.collections().create(body={"manifest_text":cw.manifest_text()}).execute()
  File "/tmp/tmp.HDfRwOWIb9/VENVDIR/local/lib/python2.7/site-packages/googleapiclient/_helpers.py", line 130, in positional_wrapper
    return wrapped(*args, **kwargs)
  File "/tmp/tmp.HDfRwOWIb9/VENVDIR/local/lib/python2.7/site-packages/googleapiclient/http.py", line 840, in execute
    raise HttpError(resp, content, uri=self.uri)
ApiError: <HttpError 422 when requesting https://0.0.0.0:43523/arvados/v1/collections?alt=json returned "Manifest text Manifest invalid for stream 5: invalid file token "4:1:\u0001\\"">

#14 Updated by Lucas Di Pentima 3 months ago

The changes LGTM. However, there's the pending FUSE issue. Just in case I did a complete test run: https://ci.curoverse.com/job/developer-run-tests/1101/

#17 Updated by Peter Amstutz 3 months ago

  • Status changed from In Progress to Resolved

Also available in: Atom PDF