Bug #22434
closed"Error checking spot interruptions:" is actually a warning, text should reflect that
Description
User saw "Error checking spot interruptions" and thought this was an error that was causing them problems, but this is a warning. The text should be softened a bit. E.g. "Warning: possible problem checking spot interruptions"
Updated by Peter Amstutz 3 months ago
- Category set to Crunch
- Description updated (diff)
Updated by Tom Clegg 3 months ago
- Related to Bug #22454: Spot instance termination not being detected? added
Updated by Tom Clegg 3 months ago
- Status changed from New to In Progress
22434-spot-check-warning @ c22b3ee97ef626e800ff0f72f3af65ea47838542 -- developer-run-tests: #4597
retry wb2 units run-tests-remainder: #4871
retry wb2 units run-tests-remainder: #4873
- All agreed upon points are implemented / addressed. Describe changes from pre-implementation design.
- ✅ new log message is:
Temporarily unable to check spot instance interruptions: 503 Service Unavailable (will retry in 5s)
- ✅ new log message is:
- Anything not implemented (discovered or discussed during work) has a follow-up story.
- N/A
- Code is tested and passing, both automated and manual, what manual testing was done is described.
- ✅ existing tests are updated
- New or changed UX/UX and has gotten feedback from stakeholders.
- N/A
- Documentation has been updated.
- N/A
- Behaves appropriately at the intended scale (describe intended scale).
- N/A
- Considered backwards and forwards compatibility issues between client and server.
- N/A - messages in runtime_status have not changed
- Follows our coding standards and GUI style guidelines.
- ✅
Updated by Brett Smith 3 months ago
Tom Clegg wrote in #note-6:
22434-spot-check-warning @ c22b3ee97ef626e800ff0f72f3af65ea47838542 -- developer-run-tests: #4597
This is all good. Since we're thinking about the usability of these log messages, I'm looking at this:
if err != nil {
runner.CrunchLog.Printf("Temporarily unable to check spot instance interruptions: %s (will retry in %v)", err, spotInterruptionCheckInterval)
failures++
if failures > 5 {
runner.CrunchLog.Printf("Giving up on checking spot instance interruptions after too many consecutive errors")
return
}
continue
}
It's a little unfortunate that the logs can say "(will retry)" and then immediately after that "nm I lied I'm giving up." What if this flowed like failures++; if failures <= 5 { log the first message; continue; } else { log the give up message; return; }
?
Updated by Tom Clegg 3 months ago
Right, "will retry" followed immediately by "giving up" is unfortunate.
Rearranged messages, now it looks like this:
zzzzz-zzzzz-zzzzzzzzzzzzzzz 2025-01-10T20:09:26.721233648Z Unable to check spot instance interruptions: 503 Service Unavailable -- will retry in 125ms zzzzz-zzzzz-zzzzzzzzzzzzzzz 2025-01-10T20:09:26.845810061Z Unable to check spot instance interruptions: 503 Service Unavailable -- will retry in 125ms zzzzz-zzzzz-zzzzzzzzzzzzzzz 2025-01-10T20:09:26.971103911Z Unable to check spot instance interruptions: 503 Service Unavailable -- will retry in 125ms zzzzz-zzzzz-zzzzzzzzzzzzzzz 2025-01-10T20:09:27.096125533Z Unable to check spot instance interruptions: 503 Service Unavailable -- will retry in 125ms zzzzz-zzzzz-zzzzzzzzzzzzzzz 2025-01-10T20:09:27.220723738Z Unable to check spot instance interruptions: 503 Service Unavailable -- now giving up after too many consecutive errors
22434-spot-check-warning @ 6ac713a1fc7805ffe47d733fca1495e856be54be -- developer-run-tests: #4599
Updated by Brett Smith 3 months ago
Tom Clegg wrote in #note-8:
22434-spot-check-warning @ 6ac713a1fc7805ffe47d733fca1495e856be54be -- developer-run-tests: #4599
This flows even better, thanks. My one last nit here is now with this change, we lost the word "Temporarily" so now "Unable" sounds permanent and a little bit spooky. I get that the later retry text should clarify that, but I wonder if we can do better. What about "Spot instance interruption check did not succeed (%s)"? I think putting the subject before the verb can help contextualize this.
If that's too big or too off-style, maybe "Could not check for spot instance interruption: %s"? I feel like this is just vaguer but I feel like that kind of is what we're after.
Happy to workshop it in chat, or not. I don't feel super strongly about this, just my personal take given the tone we're trying to strike in this ticket. Thanks.
Updated by Tom Clegg 3 months ago
22434-spot-check-warning @ fd55bde2aef8288b63c7f084982041060006d711 -- developer-run-tests: #4602
"Spot instance interruption check was inconclusive: %s"
Updated by Tom Clegg 3 months ago
- Status changed from In Progress to Resolved
Applied in changeset arvados|dda08e5e2a0f6aca8342278a66772119b0df2db1.