Project

General

Profile

Actions

Feature #14706

closed

[Crunch2] Retain references + permissions to earlier containers when retrying a container request

Added by Peter Amstutz over 5 years ago. Updated over 4 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
-
Category:
-
Target version:
-
Story points:
-

Description

The container request record lists the most recent container attempted to fulfill the request. This means when a cancelled container is retried, the earlier cancelled containers are not visible to the user: the container UUID is no longer mentioned in the container request record, which means that even if the client remembers the UUID, the user no longer has permission to retrieve the container record.

(See #14870 for the related problem that the logs from previous attempts are not preserved in the container request's log collection.)

Proposal:

Need a column that has uuids of all containers. Can use array column, eg https://www.postgresql.org/docs/9.6/arrays.html, or JSONB column.

Current data model has "container_uuid" as a singular value. It would be a backwards compatibility problem if that changed to be an array. API should report past attempts in a separate field, like "past_container_uuids".

Unclear if it would be better in the underlying database to have a single array column (where first/last item is always the most recent attempt), or retain container_uuid column and add a past_container_uuids column.

Need to be able to join array column to grant read permission to container records. Section 8.15.5 of postgres docs suggest this is something like:

container.uuid = ANY (container_request.past_container_uuids)


Related issues

Related to Arvados - Feature #8018: [Crunch2] Identify container failure and retryResolvedPeter Amstutz09/23/2016Actions
Related to Arvados - Idea #14870: [API] Access logs from previous attempts after auto-retrying a container requestResolvedPeter Amstutz03/01/2019Actions
Actions #1

Updated by Peter Amstutz over 5 years ago

  • Status changed from New to In Progress
Actions #2

Updated by Peter Amstutz over 5 years ago

  • Description updated (diff)
  • Status changed from In Progress to New
Actions #3

Updated by Peter Amstutz over 5 years ago

  • Tracker changed from Bug to Feature
Actions #4

Updated by Peter Amstutz about 5 years ago

  • Description updated (diff)
Actions #5

Updated by Tom Clegg about 5 years ago

  • Related to Feature #8018: [Crunch2] Identify container failure and retry added
Actions #6

Updated by Tom Clegg about 5 years ago

I'm not sure adding an array of container UUIDs to the container_requests table would solve this problem. Often the most valuable troubleshooting information is in the log files, which would still be inaccessible.

It might be more useful to focus on preserving all relevant logs in the container request's log collection, even if they span multiple containers. Perhaps the API server could merge the logs: e.g., instead of replacing the CR's entire log collection when the container's log is updated, just copy the container's log files into a "container ${uuid}" subdir in the container request's log collection. This would disturb existing scripts/users who expect the log files to be at the top level, but it would be compatible with multiple concurrent containers (e.g., speculative retry without killing, and replication>1 service containers).

This also helps in the case where the container record itself is really what's wanted, since that is included in the container's log collection. (There are currently some exceptions -- e.g., a log collection isn't created at all when a container doesn't fit any instance type -- but those could be fixed.)

It's also worth addressing the permission issue, at least for admins (currently even the dispatcher isn't allowed to see that a container has state=Cancelled if all matching CRs have had different containers assigned!). If we need to do it for users, we should consider the performance implications of an array vs. a separate table to express the many-to-many relationship.

Actions #7

Updated by Tom Clegg about 5 years ago

One more refinement: Put a copy of the latest container's logs in the root dir of the container request's log collection, in addition to a subdir named after the container UUID. This way, existing scripts continue to work on new logs.

Actions #8

Updated by Tom Clegg about 5 years ago

  • Related to Idea #14870: [API] Access logs from previous attempts after auto-retrying a container request added
Actions #9

Updated by Tom Clegg about 5 years ago

  • Subject changed from [Crunch2] Retain record of container retries to [Crunch2] Retain references + permissions to earlier containers when retrying a container request
  • Description updated (diff)
Actions #10

Updated by Peter Amstutz over 4 years ago

  • Status changed from New to Resolved
Actions #11

Updated by Peter Amstutz over 4 years ago

  • Target version deleted (To Be Groomed)
Actions

Also available in: Atom PDF