Project

General

Profile

Actions

Bug #21770

open

Evaluation expressions that timeout can raise JSONDecodeError

Added by Brett Smith 8 months ago.

Status:
New
Priority:
Normal
Assigned To:
-
Category:
CWL
Target version:
-
Story points:
-

Description

This is a cwltool bug that arvados-cwl-runner inherits. I have filed it upstream, but I'm duplicating the information here for awareness and tracking.

Expected Behavior

When an evaluation expression exceeds the timeout, the resulting error should be clear that is the cause.

Actual Behavior

At least some of the time, this situation raises a JSONDecodeError exception. If the user doesn't realize the timeout might be the issue, this can cause them to lose a lot of time trying to debug an issue in the expression itself where none exists.

Full Traceback

This issue was noticed with a "real" expression that similarly built a large array of an array of files, and hit the default evaluation timeout. However, you can reproduce the issue using the attached minimal reproduction and a small evaluation timeout:

% venv/bin/cwltool --debug --eval-timeout=.1 wf.cwl array-in.yml
INFO venv/bin/cwltool 3.1.20240404144621
INFO Resolved 'wf.cwl' to 'file:///home/brett/Scratch/wf.cwl'
DEBUG [workflow ] initialized from file:///home/brett/Scratch/wf.cwl
INFO [workflow ] start
DEBUG [workflow ] inputs {}
INFO [workflow ] starting step build_array
DEBUG [step build_array] job input {}
DEBUG [step build_array] evaluated job input to {}
INFO [step build_array] start
Expecting value: line 1 column 1 (char 0)
script was:
01 "use strict";
02 var inputs = {};
03 var self = null;
04 var runtime = {
05     "cores": 1,
06     "ram": 256,
07     "tmpdirSize": 1024,
08     "outdirSize": 1024,
09     "tmpdir": null,
10     "outdir": null
11 };
12 (function(){
13  var result = [];
14  for (var i=0; i<100000; i++) {
15    var arr = [];
16    result.push(arr);
17    for (var j=0; j<100000; j++) {
18      var num = i * 100000 + j;
19      arr.push({
20        class: "File",
21        location: "file:///tmp/array.cwl/file" + num.toString() + ".fq",
22        format: "http://edamontology.org/format_1930" 
23      });
24    }
25  }
26  return {"out_arr": result};
27 })()
stdout was: ''
stderr was: ''
Traceback (most recent call last):
  File "/home/brett/Scratch/venv/lib/python3.12/site-packages/cwl_utils/sandboxjs.py", line 512, in eval
    return cast(CWLOutputType, json.loads(stdout))
                               ^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/json/__init__.py", line 346, in loads
    return _default_decoder.decode(s)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

The above exception was the direct cause of the following exception:

[Note: Everything from here on is basically propagation of the same root cause exception, but here it is for completeness:]

Traceback (most recent call last):
  File "/home/brett/Scratch/venv/lib/python3.12/site-packages/cwl_utils/expression.py", line 315, in do_eval
    return interpolate(
           ^^^^^^^^^^^^
  File "/home/brett/Scratch/venv/lib/python3.12/site-packages/cwl_utils/expression.py", line 215, in interpolate
    e = evaluator(
        ^^^^^^^^^^
  File "/home/brett/Scratch/venv/lib/python3.12/site-packages/cwl_utils/expression.py", line 161, in evaluator
    return cast(CWLOutputType, js_engine.eval(ex, jslib, **kwargs))
                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/brett/Scratch/venv/lib/python3.12/site-packages/cwl_utils/sandboxjs.py", line 514, in eval
    raise JavascriptException(
cwl_utils.errors.JavascriptException: Expecting value: line 1 column 1 (char 0)
script was:
01 "use strict";
02 var inputs = {};
03 var self = null;
04 var runtime = {
05     "cores": 1,
06     "ram": 256,
07     "tmpdirSize": 1024,
08     "outdirSize": 1024,
09     "tmpdir": null,
10     "outdir": null
11 };
12 (function(){
13  var result = [];
14  for (var i=0; i<100000; i++) {
15    var arr = [];
16    result.push(arr);
17    for (var j=0; j<100000; j++) {
18      var num = i * 100000 + j;
19      arr.push({
20        class: "File",
21        location: "file:///tmp/array.cwl/file" + num.toString() + ".fq",
22        format: "http://edamontology.org/format_1930" 
23      });
24    }
25  }
26  return {"out_arr": result};
27 })()
stdout was: ''
stderr was: ''

ERROR Got workflow error
[error messages and tracebacks repeat from here]

Diagnosis

This occurs because of a few related issues in the way that the expression evaluation timeout is implemented:

In general, there's a lot of code in exec_js_process that seems like it could be handled by the subprocess module directly, following the example code documented with !Popen.communicate. A rough sketch would be something like:

def exec_js_process(
    self,
    js_text: str,
    timeout: float = default_timeout,
    ...,
) -> Tuple[int, str, str]:
    ...
    with subprocess.Popen(
            ["nodejs", "--eval", js_text],
            stdin=subprocess.DEVNULL,
            stdout=subprocess.PIPE,
            stderr=subprocess.PIPE,
            text=True,
            encoding='utf-8',
    ) as nodejs:
        try:
            stdoutdata, stderrdata = nodejs.communicate(timeout=timeout)
        except subprocess.TimeoutExpired:
            nodejs.kill()
            stdoutdata, stderrdata = nodejs.communicate()
            returncode = -1
        else:
            returncode = nodejs.returncode
    return returncode, stdoutdata, stderrdata

This would still need code at the top to find the name of the Node command, handle other arguments, etc. But I think this could replace all the process and timeout management code that currently lives in the method, while addressing the bugs above.

No data to display

Actions

Also available in: Atom PDF