Bug #16482

[crunch] bump a-c-r's cwltool dependency to pass CWL v1.2.0-dev3 tests

Added by Ward Vandewege about 1 month ago. Updated 19 days ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
-
Target version:
Start date:
06/17/2020
Due date:
% Done:

100%

Estimated time:
(Total: 0.00 h)
Story points:
-

Subtasks

Task #16531: Review 16482-bump-cwltool-versionResolvedWard Vandewege


Related issues

Related to Arvados - Bug #16382: arvados-cwl-conformance-tests failing in jenkinsResolved06/25/2020

Associated revisions

Revision 0aec9ab0
Added by Ward Vandewege about 1 month ago

Merge branch '16482-bump-cwltool-version'

closes #16482

Arvados-DCO-1.1-Signed-off-by: Ward Vandewege <>

Revision a5a6111e
Added by Ward Vandewege 23 days ago

Merge branch '16482-bump-cwltool-version'

closes #16482

Arvados-DCO-1.1-Signed-off-by: Ward Vandewege <>

History

#1 Updated by Ward Vandewege about 1 month ago

  • Status changed from New to In Progress

#2 Updated by Ward Vandewege about 1 month ago

  • Subject changed from [crunch] bump cwltool dependency version on a-c-r to pass CWL v1.2.0-dev3 tests to [crunch] bump a-c-r's cwltool dependency to pass CWL v1.2.0-dev3 tests

#3 Updated by Ward Vandewege about 1 month ago

Running developer tests at https://ci.arvados.org/view/Developer/job/developer-run-tests/1887/, and they passed.

Ready for review 0f97ce28deb04faf2d6b19c7312ef233f28665ad on branch 16482-bump-cwltool-version

#4 Updated by Peter Amstutz about 1 month ago

Ward Vandewege wrote:

Running developer tests at https://ci.arvados.org/view/Developer/job/developer-run-tests/1887/, and they passed.

Ready for review 0f97ce28deb04faf2d6b19c7312ef233f28665ad on branch 16482-bump-cwltool-version

LGTM.

#5 Updated by Anonymous about 1 month ago

  • % Done changed from 0 to 100
  • Status changed from In Progress to Resolved

#6 Updated by Ward Vandewege about 1 month ago

  • Related to Bug #16382: arvados-cwl-conformance-tests failing in jenkins added

#7 Updated by Ward Vandewege about 1 month ago

  • Target version changed from 2020-06-03 Sprint to 2020-06-17 Sprint
  • Status changed from Resolved to In Progress

It turns out we needed an even newer version of cwltool, which was released today. This also required bumping the version of schema-salad. I pushed an updated 8e9f21692e6a815b4aac226f8fb87ec3d716f781 on the 16482-bump-cwltool-version branch, and ran the developer tests. The sdk/cwl tests (see https://ci.arvados.org/view/Developer/job/developer-run-tests-remainder/1964/consoleFull) are now failing with

======================================================================
16:12:54 ERROR: test_submit (unittest.loader._FailedTest)
16:12:54 ----------------------------------------------------------------------
16:12:54 ImportError: Failed to import test module: test_submit
16:12:54 Traceback (most recent call last):
16:12:54 File "/usr/lib/python3.7/unittest/loader.py", line 154, in loadTestsFromName
16:12:54 module = import(module_name)
16:12:54 File "/tmp/workspace/developer-run-tests-remainder/sdk/cwl/tests/test_submit.py", line 36, in &lt;module&gt;
16:12:54 import arvados_cwl
16:12:54 File "/tmp/workspace/developer-run-tests-remainder/sdk/cwl/arvados_cwl/__init__.py", line 26, in &lt;module&gt;
16:12:54 from cwltool.pathmapper import adjustFileObjs, adjustDirObjs, get_listing
16:12:54 ImportError: cannot import name 'get_listing' from 'cwltool.pathmapper' (/home/jenkins/tmp/VENV3DIR/lib/python3.7/site-packages/cwltool/pathmapper.py)
16:12:54

#8 Updated by Ward Vandewege about 1 month ago

I tracked the above down to a change in cwltool, and made the according change in a-c-r at fbc4a41fab79220108602f1cadd30f34cdbcea11 on branch 16482-bump-cwltool-version.

The tests now fail like this:


test_tq_error (tests.test_tq.TestTaskQueue) ... 2020-06-04 00:27:52 arvados.cwl-runner ERROR: Unhandled exception running task
Traceback (most recent call last):
  File "/root/arvados/sdk/cwl/arvados_cwl/task_queue.py", line 36, in task_queue_func
    task()
  File "/root/arvados/sdk/cwl/tests/test_tq.py", line 20, in fail_task
    raise Exception("Testing error handling")
Exception: Testing error handling
ok
test_create (tests.test_submit.TestCreateWorkflow) ... INFO setup.py 2.1.0.dev20200603211541, arvados-python-client 2.1.0.dev20200521142235, cwltool 3.0.20200530110633
INFO Resolved 'tests/wf/submit_wf.cwl' to 'file:///root/arvados/sdk/cwl/tests/wf/submit_wf.cwl'
ERROR I'm sorry, I couldn't load this CWL file.
The error was: 
Traceback (most recent call last):
  File "/root/arvados/sdk/cwl/.eggs/cwltool-3.0.20200530110633-py3.7.egg/cwltool/main.py", line 940, in main
    skip_schemas=args.skip_schemas,
  File "/root/arvados/sdk/cwl/.eggs/cwltool-3.0.20200530110633-py3.7.egg/cwltool/load_tool.py", line 360, in resolve_and_validate_document
    (sch_document_loader, avsc_names) = process.get_schema(cwlVersion)[:2]
  File "/root/arvados/sdk/cwl/.eggs/cwltool-3.0.20200530110633-py3.7.egg/cwltool/process.py", line 221, in get_schema
    SCHEMA_CACHE[version] = load_schema(custom_schemas[version][0], cache=cache)
  File "/root/arvados/sdk/cwl/.eggs/schema_salad-6.0.20200601095207-py3.7.egg/schema_salad/schema.py", line 242, in load_schema
    schema_doc, schema_metadata = metaschema_loader.resolve_ref(schema_ref, "")
  File "/root/arvados/sdk/cwl/.eggs/schema_salad-6.0.20200601095207-py3.7.egg/schema_salad/ref_resolver.py", line 718, in resolve_ref
    doc = self.fetch(doc_url, inject_ids=(not mixin))
  File "/root/arvados/sdk/cwl/.eggs/schema_salad-6.0.20200601095207-py3.7.egg/schema_salad/ref_resolver.py", line 1155, in fetch
    text = self.fetch_text(url)
  File "/root/arvados/sdk/cwl/.eggs/schema_salad-6.0.20200601095207-py3.7.egg/schema_salad/ref_resolver.py", line 178, in fetch_text
    assert isinstance(result, str)
AssertionError
FAIL
...

The assertion fails because 'result' is not of type `str`, but of type `bytes`. The latter is for binary data, so this is a bit mysterious. Why does it think that data is binary? Here's what it looks like when printed out:

test_create (tests.test_submit.TestCreateWorkflow) ... INFO setup.py 2.1.0.dev20200603211541, arvados-python-client 2.1.0.dev20200521142235, cwltool 3.0.20200530110633
INFO Resolved 'tests/wf/submit_wf.cwl' to 'file:///root/arvados/sdk/cwl/tests/wf/submit_wf.cwl'
b'# Copyright (C) The Arvados Authors. All rights reserved.\n#\n# SPDX-License-Identifier: Apache-2.0\n\n$base: "http://arvados.org/cwl#"\n$namespaces:\n  cwl: "https://w3id.org/cwl/cwl#"\n  cwltool: "http://commonwl.org/cwltool#"\n$graph:\n- $import: https://w3id.org/cwl/CommonWorkflowLanguage.yml\n\n- name: cwltool:LoadListingRequirement\n  type: record\n  extends: cwl:ProcessRequirement\n  inVocab: false\n  fields:\n    class:\n      type: string\n      doc: "Always \'LoadListingRequirement\'"\n      jsonldPredicate:\n        "_id": "@type"\n        "_type": "@vocab"\n    loadListing:\n      type:\n        - "null"\n        - type: enum\n          name: LoadListingEnum\n          symbols: [no_listing, shallow_listing, deep_listing]\n\n- name: cwltool:Secrets\n  type: record\n  inVocab: false\n  extends: cwl:ProcessRequirement\n  fields:\n    class:\n      type: string\n      doc: "Always \'Secrets\'"\n      jsonldPredicate:\n        "_id": "@type"\n        "_type": "@vocab"\n    secrets:\n      type: string[]\n      doc: |\n        List one or more input parameters that are sensitive (such as passwords)\n        which will be deliberately obscured from logging.\n      jsonldPredicate:\n        "_type": "@id"\n        refScope: 0\n\n- name: cwltool:TimeLimit\n  type: record\n  inVocab: false\n  extends: cwl:ProcessRequirement\n  doc: |\n    Set an upper limit on the execution time of a CommandLineTool or\n    ExpressionTool.  A tool execution which exceeds the time limit may\n    be preemptively terminated and considered failed.  May also be\n    used by batch systems to make scheduling decisions.\n  fields:\n    - name: class\n      type: string\n      doc: "Always \'TimeLimit\'"\n      jsonldPredicate:\n        "_id": "@type"\n        "_type": "@vocab"\n    - name: timelimit\n      type: [long, string]\n      doc: |\n        The time limit, in seconds.  A time limit of zero means no\n        time limit.  Negative time limits are an error.\n\n- name: RunInSingleContainer\n  type: record\n  extends: cwl:ProcessRequirement\n  inVocab: false\n  doc: |\n    Indicates that a subworkflow should run in a single container\n    and not be scheduled as separate steps.\n  fields:\n    - name: class\n      type: string\n      doc: "Always \'arv:RunInSingleContainer\'"\n      jsonldPredicate:\n        _id: "@type"\n        _type: "@vocab"\n\n- name: OutputDirType\n  type: enum\n  symbols:\n    - local_output_dir\n    - keep_output_dir\n  doc:\n    - |\n      local_output_dir: Use regular file system local to the compute node.\n      There must be sufficient local scratch space to store entire output;\n      specify this with `outdirMin` of `ResourceRequirement`.  Files are\n      batch uploaded to Keep when the process completes.  Most compatible, but\n      upload step can be time consuming for very large files.\n    - |\n      keep_output_dir: Use writable Keep mount.  Files are streamed to Keep as\n      they are written.  Does not consume local scratch space, but does consume\n      RAM for output buffers (up to 192 MiB per file simultaneously open for\n      writing.)  Best suited to processes which produce sequential output of\n      large files (non-sequential writes may produced fragmented file\n      manifests).  Supports regular files and directories, does not support\n      special files such as symlinks, hard links, named pipes, named sockets,\n      or device nodes.\n\n\n- name: RuntimeConstraints\n  type: record\n  extends: cwl:ProcessRequirement\n  inVocab: false\n  doc: |\n    Set Arvados-specific runtime hints.\n  fields:\n    - name: class\n      type: string\n      doc: "Always \'arv:RuntimeConstraints\'"\n      jsonldPredicate:\n        _id: "@type"\n        _type: "@vocab"\n    - name: keep_cache\n      type: int?\n      doc: |\n        Size of file data buffer for Keep mount in MiB. Default is 256\n        MiB. Increase this to reduce cache thrashing in situations such as\n        accessing multiple large (64+ MiB) files at the same time, or\n        performing random access on a large file.\n    - name: outputDirType\n      type: OutputDirType?\n      doc: |\n        Preferred backing store for output staging.  If not specified, the\n        system may choose which one to use.\n\n- name: PartitionRequirement\n  type: record\n  extends: cwl:ProcessRequirement\n  inVocab: false\n  doc: |\n    Select preferred compute partitions on which to run jobs.\n  fields:\n    - name: class\n      type: string\n      doc: "Always \'arv:PartitionRequirement\'"\n      jsonldPredicate:\n        _id: "@type"\n        _type: "@vocab"\n    - name: partition\n      type:\n        - string\n        - string[]\n\n- name: APIRequirement\n  type: record\n  extends: cwl:ProcessRequirement\n  inVocab: false\n  doc: |\n    Indicates that process wants to access to the Arvados API.  Will be granted\n    limited network access and have ARVADOS_API_HOST and ARVADOS_API_TOKEN set\n    in the environment.\n  fields:\n    - name: class\n      type: string\n      doc: "Always \'arv:APIRequirement\'"\n      jsonldPredicate:\n        _id: "@type"\n        _type: "@vocab"\n\n- name: IntermediateOutput\n  type: record\n  extends: cwl:ProcessRequirement\n  inVocab: false\n  doc: |\n    Specify desired handling of intermediate output collections.\n  fields:\n    class:\n      type: string\n      doc: "Always \'arv:IntermediateOutput\'"\n      jsonldPredicate:\n        _id: "@type"\n        _type: "@vocab"\n    outputTTL:\n      type: int\n      doc: |\n        If the value is greater than zero, consider intermediate output\n        collections to be temporary and should be automatically\n        trashed. Temporary collections will be trashed `outputTTL` seconds\n        after creation.  A value of zero means intermediate output should be\n        retained indefinitely (this is the default behavior).\n\n        Note: arvados-cwl-runner currently does not take workflow dependencies\n        into account when setting the TTL on an intermediate output\n        collection. If the TTL is too short, it is possible for a collection to\n        be trashed before downstream steps that consume it are started.  The\n        recommended minimum value for TTL is the expected duration of the\n        entire the workflow.\n\n- name: ReuseRequirement\n  type: record\n  extends: cwl:ProcessRequirement\n  inVocab: false\n  doc: |\n    Enable/disable work reuse for current process.  Default true (work reuse enabled).\n  fields:\n    - name: class\n      type: string\n      doc: "Always \'arv:ReuseRequirement\'"\n      jsonldPredicate:\n        _id: "@type"\n        _type: "@vocab"\n    - name: enableReuse\n      type: boolean\n\n- name: WorkflowRunnerResources\n  type: record\n  extends: cwl:ProcessRequirement\n  inVocab: false\n  doc: |\n    Specify memory or cores resource request for the CWL runner process itself.\n  fields:\n    class:\n      type: string\n      doc: "Always \'arv:WorkflowRunnerResources\'"\n      jsonldPredicate:\n        _id: "@type"\n        _type: "@vocab"\n    ramMin:\n      type: int?\n      doc: Minimum RAM, in mebibytes (2**20)\n      jsonldPredicate: "https://w3id.org/cwl/cwl#ResourceRequirement/ramMin"\n    coresMin:\n      type: int?\n      doc: Minimum cores allocated to cwl-runner\n      jsonldPredicate: "https://w3id.org/cwl/cwl#ResourceRequirement/coresMin"\n    keep_cache:\n      type: int?\n      doc: |\n        Size of collection metadata cache for the workflow runner, in\n        MiB.  Default 256 MiB.  Will be added on to the RAM request\n        when determining node size to request.\n      jsonldPredicate: "http://arvados.org/cwl#RuntimeConstraints/keep_cache"\n\n- name: ClusterTarget\n  type: record\n  extends: cwl:ProcessRequirement\n  inVocab: false\n  doc: |\n    Specify where a workflow step should run\n  fields:\n    class:\n      type: string\n      doc: "Always \'arv:ClusterTarget\'"\n      jsonldPredicate:\n        _id: "@type"\n        _type: "@vocab"\n    cluster_id:\n      type: string?\n      doc: The cluster to run the container\n    project_uuid:\n      type: string?\n      doc: The project that will own the container requests and intermediate collections\n'
ERROR I'm sorry, I couldn't load this CWL file.
The error was: 
Traceback (most recent call last):
  File "/root/arvados/sdk/cwl/.eggs/cwltool-3.0.20200530110633-py3.7.egg/cwltool/main.py", line 940, in main
    skip_schemas=args.skip_schemas,
  File "/root/arvados/sdk/cwl/.eggs/cwltool-3.0.20200530110633-py3.7.egg/cwltool/load_tool.py", line 360, in resolve_and_validate_document
    (sch_document_loader, avsc_names) = process.get_schema(cwlVersion)[:2]
  File "/root/arvados/sdk/cwl/.eggs/cwltool-3.0.20200530110633-py3.7.egg/cwltool/process.py", line 221, in get_schema
    SCHEMA_CACHE[version] = load_schema(custom_schemas[version][0], cache=cache)
  File "/root/arvados/sdk/cwl/.eggs/schema_salad-6.0.20200601095207-py3.7.egg/schema_salad/schema.py", line 242, in load_schema
    schema_doc, schema_metadata = metaschema_loader.resolve_ref(schema_ref, "")
  File "/root/arvados/sdk/cwl/.eggs/schema_salad-6.0.20200601095207-py3.7.egg/schema_salad/ref_resolver.py", line 723, in resolve_ref
    doc = self.fetch(doc_url, inject_ids=(not mixin))
  File "/root/arvados/sdk/cwl/.eggs/schema_salad-6.0.20200601095207-py3.7.egg/schema_salad/ref_resolver.py", line 1160, in fetch
    text = self.fetch_text(url)
  File "/root/arvados/sdk/cwl/.eggs/schema_salad-6.0.20200601095207-py3.7.egg/schema_salad/ref_resolver.py", line 183, in fetch_text
    assert isinstance(result, str)
AssertionError
FAIL

#9 Updated by Michael Crusoe about 1 month ago

Ward Vandewege wrote:

The assertion fails because 'result' is not of type `str`, but of type `bytes`. The latter is for binary data, so this is a bit mysterious. Why does it think that data is binary? Here's what it looks like when printed out:

Note the `b` in `b'# Copyright (C)...'`, you are passing in binary.

#10 Updated by Ward Vandewege about 1 month ago

Michael Crusoe wrote:

Ward Vandewege wrote:

The assertion fails because 'result' is not of type `str`, but of type `bytes`. The latter is for binary data, so this is a bit mysterious. Why does it think that data is binary? Here's what it looks like when printed out:

Note the `b` in `b'# Copyright (C)...'`, you are passing in binary.

Indeed; and thanks for your help with the identification of where this happens (resource_stream calls). It's fixed in f423aff73c1927a74e39c738e08bd6f1100a94c5 on branch 16482-bump-cwltool-version, tests running at https://ci.arvados.org/view/Developer/job/developer-run-tests-remainder/1966/console, and they passed.

#12 Updated by Ward Vandewege 24 days ago

  • Target version changed from 2020-06-17 Sprint to 2020-07-01 Sprint

#13 Updated by Peter Amstutz 23 days ago

Ward Vandewege wrote:

f423aff73c1927a74e39c738e08bd6f1100a94c5 on branch 16482-bump-cwltool-version is ready for review

tests passed at https://ci.arvados.org/view/Developer/job/developer-run-tests-remainder/1966/console

LGTM.

#14 Updated by Anonymous 23 days ago

  • Status changed from In Progress to Resolved

Also available in: Atom PDF