Project

General

Profile

Actions

Bug #21611

closed

preemption notices do not appear in crunch-run.txt

Added by Peter Amstutz about 2 months ago. Updated 4 days ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
Crunch
Story points:
-

Description

I've looked at a number of containers now that have been preempted and none of them have crunch-run.txt updated to say that it received a preemption notice even though it is supposed to.


Subtasks 1 (0 open1 closed)

Task #21734: Review 21611-log-chunk-delayResolvedTom Clegg05/15/2024Actions
Actions #1

Updated by Peter Amstutz about 2 months ago

  • Target version changed from Future to Development 2024-04-24 sprint
Actions #2

Updated by Peter Amstutz about 2 months ago

  • Description updated (diff)
Actions #3

Updated by Peter Amstutz about 1 month ago

  • Target version changed from Development 2024-04-24 sprint to Development 2024-05-08 sprint
Actions #4

Updated by Peter Amstutz about 1 month ago

  • Target version changed from Development 2024-05-08 sprint to Development 2024-04-24 sprint
Actions #5

Updated by Peter Amstutz about 1 month ago

  • Target version changed from Development 2024-04-24 sprint to Development 2024-05-08 sprint
Actions #6

Updated by Peter Amstutz 25 days ago

  • Description updated (diff)
  • Subject changed from crunch-run updates copy of container.json in log collection when a container ends and/or runtime_status is updated to preemption notices do not appear in crunch-run.txt
  • Tracker changed from Feature to Bug
Actions #7

Updated by Peter Amstutz 25 days ago

  • Assigned To set to Tom Clegg
Actions #8

Updated by Tom Clegg 25 days ago

  • Status changed from New to In Progress
I suspect the "write log, then save log collection" is doing the opposite of what we want, because
  • "write log entry" just writes the message to the throttled-logging buffer, not yet to the log collection
  • "save log collection" saves the log collection and resets the auto-flush timer, minimizing the chance auto-flush will happen before the preempted instance shuts down

It's possible something else is going on too, but either way, we should rearrange the logging pipeline so the log collection gets updated immediately instead of after the "group logs into chunks" step. If nothing else, that will reduce latency for showing logs in workbench.

Actions #9

Updated by Tom Clegg 16 days ago

21611-log-chunk-delay @ 8a5db7b48c1fb11423110490267fea17161f7674 -- developer-run-tests: #4207 (flaky fuse test, see #21660)

21611-log-chunk-delay @ 8a5db7b48c1fb11423110490267fea17161f7674 -- developer-run-tests: #4208 (Something is already running on port 38402.)

21611-log-chunk-delay @ 8a5db7b48c1fb11423110490267fea17161f7674 -- developer-run-tests: #4209

Removes all the "buffer logs into chunks and send them to POST /arvados/v1/logs" code that was preventing the existing "flush logs immediately" code from working as intended (see #note-8 above).

  • All agreed upon points are implemented / addressed.
  • Anything not implemented (discovered or discussed during work) has a follow-up story.
    • N/A
  • Code is tested and passing, both automated and manual, what manual testing was done is described
    • ✅ updated preemption-warning test case to check that the container record is promptly updated with a log PDH that mentions the preemption warning message
  • Documentation has been updated.
    • N/A
  • Behaves appropriately at the intended scale (describe intended scale).
    • N/A
  • Considered backwards and forwards compatibility issues between client and server.
    • N/A
  • Follows our coding standards and GUI style guidelines.

This will also have the side effect of reducing logging latency in workbench. Previously LogBytesPerEvent/LogSecondsBetweenEvents (default 4K/5s) were introducing a store/wait/forward delay even when LimitLogBytesPerJob was zero.

Actions #10

Updated by Peter Amstutz 11 days ago

  • Target version changed from Development 2024-05-08 sprint to Development 2024-05-22 sprint
Actions #11

Updated by Brett Smith 4 days ago

Tom Clegg wrote in #note-9:

21611-log-chunk-delay @ 8a5db7b48c1fb11423110490267fea17161f7674 -- developer-run-tests: #4209

Removes all the "buffer logs into chunks and send them to POST /arvados/v1/logs" code that was preventing the existing "flush logs immediately" code from working as intended (see #note-8 above).

LGTM. My one nit is I think the configuration keys in the upgrade notes would look better in monospace. (I wish we had a documentation style guide to help us keep consistent on stuff like this.)

Actions #12

Updated by Tom Clegg 4 days ago

  • Status changed from In Progress to Resolved
Actions

Also available in: Atom PDF