Project

General

Profile

Bug #9865

Updated by Tom Clegg about 5 years ago

h2. Example from the wild 

 <pre> 
 2016-08-26 13:28:37 arvados.cwl-runner[20416] ERROR: While getting final output object: global name 'adjustFiles' is not defined 
 2016-08-26 13:28:37 arvados.cwl-runner[20416] INFO: Overall process status is success 
 </pre> 

 There's a section here where any exception will be reported but then leave "outputs" in some half-baked state. 

 https://github.com/curoverse/arvados/blob/master/sdk/cwl/arvados_cwl/runner.py#L131 

 <pre><code class="python"> 
             try: 
                 outc = arvados.collection.Collection(record["output"]) 
                 with outc.open("cwl.output.json") as f: 
                     outputs = json.load(f) 
                 def keepify(fileobj): 
                     path = fileobj["location"] 
                     if not path.startswith("keep:"): 
                         fileobj["location"] = "keep:%s/%s" % (record["output"], path) 
                 adjustFileObjs(outputs, keepify) 
                 adjustDirObjs(outputs, keepify) 
             except Exception as e: 
                 logger.error("While getting final output object: %s", e) 
 </code></pre> 

 This code should either: 
 * Reset outputs to None in the "except" block; or 
 * Make the "try" scope smaller, so once "outputs" isn't None, unexpected exceptions get propagated up. 

 It should also log the full backtrace for the caught exception. 

 The "try" block looks like it started accidentally including too much code in commit:c8d9a898cde654b53200bda0b0ef8b406dd71739 

 h2. Another example from the wild 

 <pre> 
 2019-01-30T04:31:12.159992495Z Exception RuntimeError: 'maximum recursion depth exceeded while calling a Python object' in <bound method _fileobject.__del__ of <socket._fileobject object at 0x7f1093013f50>> ignored 
 2019-01-30T04:31:12.161843551Z arvados.cwl-runner WARNING: Error checking states on API server: maximum recursion depth exceeded while calling a Python object 
 </pre> 

 After this, arvados-cwl-runner stopped producing logs every 3 seconds, and appeared to be deadlocked. 

 Another example the following day reported a "maximum recursion depth" error, but kept logging "cwltool DEBUG: [workflow workflow.json#main] job step [...] not ready". 

Back