Bug #22466
closedbuggy output glob interaction with expressions and secondary files
Updated by Peter Amstutz 3 months ago
- Position changed from -934160 to -934149
- Status changed from New to In Progress
Updated by Peter Amstutz 3 months ago
- Subject changed from possible cwl-utils substitution scanner bug affecting a-c-r 3.0 to buggy output glob interaction with expressions and secondary files
Updated by Peter Amstutz 3 months ago
Getting there.
Two test failures.
Test 56 failed: arvados-cwl-runner --disable-reuse --compute-checksum --outdir=/tmp/tmpnc72ibxe --quiet 'tests/search.cwl#main' tests/search-job.json
Test InitialWorkDirRequirement linking input files and capturing secondaryFiles on input and output. Also tests the use of a variety of parameter references and expressions in the secondaryFiles field.
Returned non-zero
Test 311 failed: arvados-cwl-runner --disable-reuse --compute-checksum --outdir=/tmp/tmpahgxhx24 --quiet tests/secondaryfiles/rename-outputs.cwl
Confirm CommandOutputParameter expression can receive a File object
Returned non-zero
Updated by Peter Amstutz 2 months ago
22466-output-glob-fix @ commit:aae450265c1753a3664c00e96105b1c0ab9fca2f
Updated by Peter Amstutz 2 months ago
22466-output-glob-fix @ b7632a9397f8583e6104aa279b9891e6cac0de2c
test re-try: developer-run-tests: #4619
Updated by Peter Amstutz 2 months ago
22466-output-glob-fix @ b7632a9397f8583e6104aa279b9891e6cac0de2c
- All agreed upon points are implemented / addressed. Describe changes from pre-implementation design.
- Workflow that was reported crashing by user should now pass with correct behavior
- Anything not implemented (discovered or discussed during work) has a follow-up story.
- n/a
- Code is tested and passing, both automated and manual, what manual testing was done is described.
- added new integration test with our test case. passes CWL v1.2.1 conformance tests.
- New or changed UX/UX and has gotten feedback from stakeholders.
- n/a
- Documentation has been updated.
- n/a
- Behaves appropriately at the intended scale (describe intended scale).
- enables use of
output_glob
feature in more cases, which benefits scale by not capturing output files that the user doesn't want.
- enables use of
- Considered backwards and forwards compatibility issues between client and server.
- does not require any changes to
output_glob
feature.
- does not require any changes to
- Follows our coding standards and GUI style guidelines.
- yes
Users use the 'glob' feature of CWL to identify which files they want to associate with output parameters. This gets somewhat complicated with secondary files (where the name of the secondary file may be based on the primary file) and the use of javascript expressions. The bug involved using a javascript expression to provide the name of a secondary file, but Arvados did not provide sufficient context to evaluate the expression successfully, and the workflow runner crashed as a result.
This branch moves the evaluation of expressions to happen later where adequate context is available. It also expands the number of cases where output_glob
can be used even when the file pattern has a wildcard in it.
Updated by Brett Smith 2 months ago
Peter Amstutz wrote in #note-8:
22466-output-glob-fix @ b7632a9397f8583e6104aa279b9891e6cac0de2c
When we're evaluating pattern expressions, can we move the start of the try
block just before pattern = self.builder.do_eval(...)
? This would help make clearer what we expect to fail, and reduce the risk of bugs from handling exceptions we didn't mean to.
When gp
is a bad type, can we raise a more specific type of exception? Maybe at the very least TypeError
?
- Follows our coding standards and GUI style guidelines.
- yes
Per PEP 8:
In arvcontainer.py
, if len(pattern) > 0:
→ if pattern:
Only a single blank line after the def _collect_globs()
function body.
Thanks.
Updated by Peter Amstutz 2 months ago
- Target version changed from Development 2025-01-29 to Development 2025-02-12
Updated by Peter Amstutz 2 months ago
22466-output-glob-fix @ 58a27389b992ec8e1b586e9a76740411c0106bad
- Reworked exception handling
Updated by Brett Smith 2 months ago
Peter Amstutz wrote in #note-12:
22466-output-glob-fix @ 58a27389b992ec8e1b586e9a76740411c0106bad
I feel like we're having some communication disconnect. I asked we limit the scope of a try
block to prevent bugs related from handling unintended exceptions. a233cfe9d5020f2ad346351a75c5391c9cdba018 makes the try
block even bigger, exacerbating that problem. Now if there are any bugs inside the try block (present or future), any changes to the data structure it's dealing with, anything like that will cause a-c-r to just assume it can't set an output glob, and it will be more difficult for us to find and address that bug.
IMO the branch would be better with this change reverted. If you strongly prefer this approach, you should at least define a custom exception type, use that for your raised exceptions, and then catch only that type in the except
block. That would also eliminate the risk of handling unintended exceptions.
In arvcontainer.py,
if len(pattern) > 0:
→if pattern:
This is still outstanding too.
Updated by Peter Amstutz about 2 months ago
Brett Smith wrote in #note-13:
Peter Amstutz wrote in #note-12:
I feel like we're having some communication disconnect. I asked we limit the scope of a
try
block to prevent bugs related from handling unintended exceptions.
Yes, I misunderstood your suggestion. The intention is still to catch expression handling errors, but it is now more narrowly scoped to the places where errors are anticipated.
IMO the branch would be better with this change reverted. If you strongly prefer this approach, you should at least define a custom exception type, use that for your raised exceptions, and then catch only that type in the
except
block. That would also eliminate the risk of handling unintended exceptions.
Yes, that is where I went with this. Now the output try/except only catches OutputGlobError
and the inner error handling converts errors into OutputGlobError
.
In arvcontainer.py,
if len(pattern) > 0:
→if pattern:
This is still outstanding too.
Fixed.
22466-output-glob-fix @ 3d5bfb2cba03fed661421658d9018560497faf3e
Updated by Brett Smith about 2 months ago
Peter Amstutz wrote in #note-14:
Yes, that is where I went with this. Now the output try/except only catches
OutputGlobError
and the inner error handling converts errors intoOutputGlobError
.
This flows much better, thank you. I was also glad to see that we log information about what got us here. I wonder if we could make this log more usable to both our users and our future selves. What about:
logger.info(
"could not set output glob because %s",
e.args[0],
exc_info=self.arvrunner.debug,
)
(I'm assuming info is the default "lowest" log level a-c-r shows, if not adjust to taste.) This way if a user goes down this path unexpectedly, the logs will have a trace, and we can explain to them how the message relates to their workflow. And the full exception information is still available at the debug level if we need it.
In order for this to be effective, all the instances of raise OutputGlobError()
should have a description added to them.
Smaller things:
exc_info=(sys.exc_info()[1] if self.debug else False)
- This can be boiled down to just
exc_info=self.debug
, the logging module will callsys.exc_info()
for you.
- Consider having
OutputGlobError
inherit fromRuntimeError
,ValueError
, or some other base class that's more specific than justException
.
- Per PEP 8, two blank lines after the body of
class OutputGlobError
(or any class definition).
Thanks.
Updated by Peter Amstutz about 2 months ago
Brett Smith wrote in #note-15:
(I'm assuming info is the default "lowest" log level a-c-r shows, if not adjust to taste.) This way if a user goes down this path unexpectedly, the logs will have a trace, and we can explain to them how the message relates to their workflow. And the full exception information is still available at the debug level if we need it.
It is logged at debug level because there may be perfectly valid workflows where OutputGlobError is raised internally, in which case we would essentially be logging a non-actionable warning. The feedback from users is that they tend to get confused and think they are doing something wrong when we log things like that, even when it is harmless.
I made a few more tweaks based on the other comments.
22466-output-glob-fix @ acd29e3b39fb93c24aa414ef252153539c2094dd
Updated by Brett Smith about 2 months ago
Peter Amstutz wrote in #note-16:
22466-output-glob-fix @ acd29e3b39fb93c24aa414ef252153539c2094dd
LGTM, thanks.
Updated by Peter Amstutz about 2 months ago
- Status changed from In Progress to Resolved