Project

General

Profile

Actions

Bug #21611

open

preemption notices do not appear in crunch-run.txt

Added by Peter Amstutz about 2 months ago. Updated 2 days ago.

Status:
In Progress
Priority:
Normal
Assigned To:
Category:
Crunch
Story points:
-

Description

I've looked at a number of containers now that have been preempted and none of them have crunch-run.txt updated to say that it received a preemption notice even though it is supposed to.


Subtasks 1 (1 open0 closed)

Task #21734: Review 21611-log-chunk-delayIn Progress05/10/2024Actions
Actions #1

Updated by Peter Amstutz about 1 month ago

  • Target version changed from Future to Development 2024-04-24 sprint
Actions #2

Updated by Peter Amstutz about 1 month ago

  • Description updated (diff)
Actions #3

Updated by Peter Amstutz about 1 month ago

  • Target version changed from Development 2024-04-24 sprint to Development 2024-05-08 sprint
Actions #4

Updated by Peter Amstutz about 1 month ago

  • Target version changed from Development 2024-05-08 sprint to Development 2024-04-24 sprint
Actions #5

Updated by Peter Amstutz about 1 month ago

  • Target version changed from Development 2024-04-24 sprint to Development 2024-05-08 sprint
Actions #6

Updated by Peter Amstutz 18 days ago

  • Description updated (diff)
  • Subject changed from crunch-run updates copy of container.json in log collection when a container ends and/or runtime_status is updated to preemption notices do not appear in crunch-run.txt
  • Tracker changed from Feature to Bug
Actions #7

Updated by Peter Amstutz 18 days ago

  • Assigned To set to Tom Clegg
Actions #8

Updated by Tom Clegg 18 days ago

  • Status changed from New to In Progress
I suspect the "write log, then save log collection" is doing the opposite of what we want, because
  • "write log entry" just writes the message to the throttled-logging buffer, not yet to the log collection
  • "save log collection" saves the log collection and resets the auto-flush timer, minimizing the chance auto-flush will happen before the preempted instance shuts down

It's possible something else is going on too, but either way, we should rearrange the logging pipeline so the log collection gets updated immediately instead of after the "group logs into chunks" step. If nothing else, that will reduce latency for showing logs in workbench.

Actions #9

Updated by Tom Clegg 9 days ago

21611-log-chunk-delay @ 8a5db7b48c1fb11423110490267fea17161f7674 -- developer-run-tests: #4207 (flaky fuse test, see #21660)

21611-log-chunk-delay @ 8a5db7b48c1fb11423110490267fea17161f7674 -- developer-run-tests: #4208 (Something is already running on port 38402.)

21611-log-chunk-delay @ 8a5db7b48c1fb11423110490267fea17161f7674 -- developer-run-tests: #4209

Removes all the "buffer logs into chunks and send them to POST /arvados/v1/logs" code that was preventing the existing "flush logs immediately" code from working as intended (see #note-8 above).

  • All agreed upon points are implemented / addressed.
  • Anything not implemented (discovered or discussed during work) has a follow-up story.
    • N/A
  • Code is tested and passing, both automated and manual, what manual testing was done is described
    • ✅ updated preemption-warning test case to check that the container record is promptly updated with a log PDH that mentions the preemption warning message
  • Documentation has been updated.
    • N/A
  • Behaves appropriately at the intended scale (describe intended scale).
    • N/A
  • Considered backwards and forwards compatibility issues between client and server.
    • N/A
  • Follows our coding standards and GUI style guidelines.

This will also have the side effect of reducing logging latency in workbench. Previously LogBytesPerEvent/LogSecondsBetweenEvents (default 4K/5s) were introducing a store/wait/forward delay even when LimitLogBytesPerJob was zero.

Actions #10

Updated by Peter Amstutz 4 days ago

  • Target version changed from Development 2024-05-08 sprint to Development 2024-05-22 sprint
Actions

Also available in: Atom PDF