Bug #7123
closed[Crunch] Should not save any log record when log writing fails
Description
We've seen a few instances recently where crunch-job can't save the job's log because writing to Keep fails. At least in some cases, crunch-job ends up updating the job record with log=[the empty collection]. For an example, see su92l-8i9sb-bdaoy3zppoxju1i.
When this happens, it would be better if crunch-job did not update the log field, or expressly set it null. Other tools, like the API server's log cleaner rake task and Workbench, recognize a null log as a sign that there's no log in Keep, where they don't recognize the empty collection content address.
Updated by Brett Smith about 9 years ago
- Target version changed from Arvados Future Sprints to 2015-11-11 sprint
- Status changed from New to In Progress
- Assigned To set to Brett Smith
Updated by Brett Smith about 9 years ago
- Target version changed from 2015-11-11 sprint to 2015-12-02 sprint
Updated by Peter Amstutz about 9 years ago
Let me see if I understand:
$logger_failed = -1;
means that nothing was read at all, and we timed out.$logger_failed = -2;
means thatlog_writer_read_output
read some characters, and then timed out.- if
$?
is non zero we set$logger_failed = $?
, it is okay if$logger_failed
is already set because ultimately we only care if $logger_failed is zero or not. - We only set
$arv_put_output
if there were no errors, otherwise we leave it null, which means workbench & other tools will access the logs in the API server log table. - Don't update the "log" field unless
log_writer_finish
returned a valid string.
Looks good to me.
Updated by Brett Smith about 9 years ago
You understand it all correctly.
Peter Amstutz wrote:
- if
$?
is non zero we set$logger_failed = $?
, it is okay if$logger_failed
is already set because ultimately we only care if $logger_failed is zero or not.
This part does seem a little overly confusing, doesn't it? Made it $logger_failed ||= $?
to preserve the original error, which is probably more helpful. Thanks.
Updated by Brett Smith about 9 years ago
- Status changed from In Progress to Resolved
- % Done changed from 0 to 100
Applied in changeset arvados|commit:6de60c7db7a98405ef7ae4ac5eb20498f095416c.