Bug #18260
closed[CWL] integration test 35 failed (singularity/LSF)
Added by Ward Vandewege about 3 years ago. Updated about 3 years ago.
Description
On 9tee4 (configured to use LSF + singularity), test 35 from the Arvados CWL test suite fails consistently (e.g. run-cwl-test-9tee4: #1251 /console):
15:02:51 Test 35 failed: arvados-cwl-runner --disable-reuse --compute-checksum --api=containers --outdir=/tmp/tmp36moghbh --quiet 16169-no-listing-hint.cwl listing-job.yml 15:02:51 Test cwltool:LoadListingRequirement propagation 15:02:51 Returned non-zero 15:02:51 ERROR [container 16169-no-listing-hint.cwl] (9tee4-dz642-rw8f8gnf0bbtbf9) error log: 15:02:51 15:02:51 2021-10-08T19:02:31.198703294Z crunch-run Not starting a gateway server (GatewayAuthSecret was not provided by dispatcher) 15:02:51 2021-10-08T19:02:31.198978468Z crunch-run crunch-run 2.3.0~dev20211008165008 (go1.17.1) started 15:02:51 2021-10-08T19:02:31.199000348Z crunch-run Executing container '9tee4-dz642-rw8f8gnf0bbtbf9' using singularity runtime 15:02:51 2021-10-08T19:02:31.199034119Z crunch-run Executing on host 'compute0.9tee4.arvadosapi.com' 15:02:51 2021-10-08T19:02:31.316882478Z crunch-run container token "v2/9tee4-gj3su-nzdweqs16uwdd1c/17n5aocpl1h49w3fdpt6itws6m3jzt2iacgk3hed3kl676czhj/9tee4-dz642-rw8f8gnf0bbtbf9" 15:02:51 2021-10-08T19:02:31.317415885Z crunch-run Running [arv-mount --foreground --allow-other --read-write --storage-classes default --crunchstat-interval=10 --file-cache 268435456 --mount-tmp tmp0 --mount-by-pdh by_id --mount-by-id by_uuid /tmp/crunch-run.9tee4-dz642-rw8f8gnf0bbtbf9.754694554/keep3163623830] 15:02:51 2021-10-08T19:02:31.956248212Z crunch-run Fetching Docker image from collection 'a2498feb8afadffbde268ab00b07fa4b+261' 15:02:51 2021-10-08T19:02:32.048897698Z crunch-run Using Docker image id "sha256:e520e478c5777b695294faa6cb442edeb63112ae3f66b38ea3520c8e755c03c8" 15:02:51 2021-10-08T19:02:32.048970278Z crunch-run Loading Docker image from keep 15:02:51 2021-10-08T19:02:32.978977970Z crunch-run Starting container 15:02:51 2021-10-08T19:02:32.980308098Z crunch-run Waiting for container to finish 15:02:51 2021-10-08T19:02:41.930650770Z stderr INFO /usr/bin/arvados-cwl-runner 2.3.0.dev20211006204117, arvados-python-client 2.3.0.dev20211006204117, cwltool 3.1.20210922203925 15:02:51 2021-10-08T19:02:41.949220287Z stderr INFO Resolved '/var/lib/cwl/workflow.json#main' to 'file:///var/lib/cwl/workflow.json#main' 15:02:51 2021-10-08T19:02:46.251818792Z stderr ERROR Tool definition failed initialization: 15:02:51 2021-10-08T19:02:46.251818792Z stderr Tool definition file:///var/lib/cwl/workflow.json#16169-step.cwl failed validation: 15:02:51 2021-10-08T19:02:46.251818792Z stderr ../../lib/cwl/workflow.json:1:12: checking object `../../lib/cwl/workflow.json#main` 15:02:51 2021-10-08T19:02:46.251818792Z stderr ../../lib/cwl/workflow.json:1:170: checking field `requirements` 15:02:51 2021-10-08T19:02:46.251818792Z stderr ../../lib/cwl/workflow.json:1:186: checking item 15:02:51 2021-10-08T19:02:46.251818792Z stderr ../../lib/cwl/workflow.json:1:187: Field `class` contains undefined reference to 15:02:51 2021-10-08T19:02:46.251818792Z stderr `http://commonwl.org/cwltool#LoadListingRequirement` 15:02:51 2021-10-08T19:02:46.832474311Z crunch-run Complete 15:02:51 ERROR Overall process status is permanentFail 15:02:51 WARNING Final process status is permanentFail
On tordo (a-d-c + singularity) this test passes without problems.
Updated by Ward Vandewege about 3 years ago
- Subject changed from [CWL] test 35 failed (singularity/LSF) to [CWL] integration test 35 failed (singularity/LSF)
Updated by Ward Vandewege about 3 years ago
I re-ran the failing test with --debug:
ci@shell:~/arvados/sdk/cwl/tests$ arvados-cwl-runner --disable-reuse --compute-checksum --api=containers --outdir=/tmp/tmpwn7jhrb1 --debug 16169-no-listing-hint.cwl listing-job.yml INFO /usr/bin/arvados-cwl-runner 2.3.0.dev20211008201118, arvados-python-client 2.3.0.dev20211006204117, cwltool 3.1.20210922203925 INFO Resolved '16169-no-listing-hint.cwl' to 'file:///home/ci/arvados/sdk/cwl/tests/16169-no-listing-hint.cwl' INFO Using cluster 9tee4 (https://workbench2.9tee4.arvadosapi.com/) INFO Upload local files: "tmp1" INFO Using collection 1c874bd4fe26056034be8a06e4c3133b+65 (9tee4-4zz18-8ro3shrbetz8wvc) INFO Using collection cache size 256 MiB DEBUG ENTER jobiter 1634066162.1472414 DEBUG EXIT jobiter 1634066162.1485233 0.0012819766998291016 DEBUG ENTER run 1634066162.1486175 INFO [container 16169-no-listing-hint.cwl] submitted container_request 9tee4-xvhdp-rfad19xav4j3qau INFO Monitor workflow progress at https://workbench2.9tee4.arvadosapi.com/processes/9tee4-xvhdp-rfad19xav4j3qau DEBUG EXIT run 1634066162.5738823 0.425264835357666 DEBUG ENTER jobiter 1634066162.57399 DEBUG EXIT jobiter 1634066162.5740707 8.058547973632812e-05 INFO [container 16169-no-listing-hint.cwl] 9tee4-xvhdp-rfad19xav4j3qau is Final DEBUG {'9tee4-bi6l4-uwgvqzzw4z97qca': OrderedDict([('href', '/keep_services/9tee4-bi6l4-uwgvqzzw4z97qca'), ('kind', 'arvados#keepService'), ('etag', 'c65xiys117q01qz9s7zvjt66i'), ('uuid', '9tee4-bi6l4-uwgvqzzw4z97qca'), ('owner_uuid', '9tee4-tpzed-ruo0j78b8s7jk57'), ('created_at', '2014-08-27T02:19:29.655913000Z'), ('modified_by_client_uuid', '9tee4-ozdt8-yft7aixdt2dx5ni'), ('modified_by_user_uuid', '9tee4-tpzed-ruo0j78b8s7jk57'), ('modified_at', '2014-08-27T02:20:21.044273000Z'), ('service_host', 'keep0.9tee4.arvadosapi.com'), ('service_port', 25107), ('service_ssl_flag', False), ('service_type', 'disk'), ('read_only', False), ('_service_root', 'http://keep0.9tee4.arvadosapi.com:25107/')]), '9tee4-bi6l4-1fum8ggyzc88g87': OrderedDict([('href', '/keep_services/9tee4-bi6l4-1fum8ggyzc88g87'), ('kind', 'arvados#keepService'), ('etag', 'dn6g7rkwx5gpwq0h053wyquax'), ('uuid', '9tee4-bi6l4-1fum8ggyzc88g87'), ('owner_uuid', '9tee4-tpzed-ruo0j78b8s7jk57'), ('created_at', '2014-08-27T02:20:27.578073000Z'), ('modified_by_client_uuid', '9tee4-ozdt8-yft7aixdt2dx5ni'), ('modified_by_user_uuid', '9tee4-tpzed-ruo0j78b8s7jk57'), ('modified_at', '2014-08-27T02:20:43.978890000Z'), ('service_host', 'keep1.9tee4.arvadosapi.com'), ('service_port', 25107), ('service_ssl_flag', False), ('service_type', 'disk'), ('read_only', False), ('_service_root', 'http://keep1.9tee4.arvadosapi.com:25107/')])} DEBUG 196c3dbc8725c9b49c1286d182922388+37950+A350e222fe194db5094cac41517e9ec170aa230da@6178540a: ['http://keep0.9tee4.arvadosapi.com:25107/', 'http://keep1.9tee4.arvadosapi.com:25107/'] DEBUG Request: GET http://keep0.9tee4.arvadosapi.com:25107/196c3dbc8725c9b49c1286d182922388+37950+A350e222fe194db5094cac41517e9ec170aa230da@6178540a INFO GET 200: 37950 bytes in 93.83320808410645 msec (0.386 MiB/sec) ERROR [container 16169-no-listing-hint.cwl] (9tee4-dz642-h60jcvdr3ru19xe) error log: 2021-10-12T19:16:20.481902684Z stderr File "/usr/share/python3/dist/python3-arvados-cwl-runner/lib/python3.7/site-packages/cwltool/workflow.py", line 234, in __init__ 2021-10-12T19:16:20.481902684Z stderr self.embedded_tool = load_tool(toolpath_object["run"], loadingContext) 2021-10-12T19:16:20.481902684Z stderr File "/usr/share/python3/dist/python3-arvados-cwl-runner/lib/python3.7/site-packages/cwltool/load_tool.py", line 481, in load_tool 2021-10-12T19:16:20.481902684Z stderr loadingContext, workflowobj, uri 2021-10-12T19:16:20.481902684Z stderr File "/usr/share/python3/dist/python3-arvados-cwl-runner/lib/python3.7/site-packages/cwltool/load_tool.py", line 381, in resolve_and_validate_document 2021-10-12T19:16:20.481902684Z stderr document_loader.resolve_all(workflowobj, fileuri) 2021-10-12T19:16:20.481902684Z stderr File "schema_salad/ref_resolver.py", line 958, in resolve_all 2021-10-12T19:16:20.481902684Z stderr File "schema_salad/ref_resolver.py", line 1190, in validate_links 2021-10-12T19:16:20.481902684Z stderr schema_salad.exceptions.ValidationException: ../../lib/cwl/workflow.json:1:12: checking object `../../lib/cwl/workflow.json#main` 2021-10-12T19:16:20.481902684Z stderr ../../lib/cwl/workflow.json:1:170: checking field `requirements` 2021-10-12T19:16:20.481902684Z stderr ../../lib/cwl/workflow.json:1:186: checking item 2021-10-12T19:16:20.481902684Z stderr ../../lib/cwl/workflow.json:1:187: Field `class` contains undefined reference to 2021-10-12T19:16:20.481902684Z stderr `http://commonwl.org/cwltool#LoadListingRequirement` 2021-10-12T19:16:20.481902684Z stderr 2021-10-12T19:16:20.481902684Z stderr The above exception was the direct cause of the following exception: 2021-10-12T19:16:20.481902684Z stderr 2021-10-12T19:16:20.481902684Z stderr Traceback (most recent call last): 2021-10-12T19:16:20.481902684Z stderr File "/usr/share/python3/dist/python3-arvados-cwl-runner/lib/python3.7/site-packages/cwltool/main.py", line 1067, in main 2021-10-12T19:16:20.481902684Z stderr tool = make_tool(uri, loadingContext) 2021-10-12T19:16:20.481902684Z stderr File "/usr/share/python3/dist/python3-arvados-cwl-runner/lib/python3.7/site-packages/cwltool/load_tool.py", line 462, in make_tool 2021-10-12T19:16:20.481902684Z stderr tool = loadingContext.construct_tool_object(processobj, loadingContext) 2021-10-12T19:16:20.481902684Z stderr File "/usr/share/python3/dist/python3-arvados-cwl-runner/lib/python3.7/site-packages/arvados_cwl/executor.py", line 210, in arv_make_tool 2021-10-12T19:16:20.481902684Z stderr return ArvadosWorkflow(self, toolpath_object, loadingContext) 2021-10-12T19:16:20.481902684Z stderr File "/usr/share/python3/dist/python3-arvados-cwl-runner/lib/python3.7/site-packages/arvados_cwl/arvworkflow.py", line 169, in __init__ 2021-10-12T19:16:20.481902684Z stderr super(ArvadosWorkflow, self).__init__(toolpath_object, loadingContext) 2021-10-12T19:16:20.481902684Z stderr File "/usr/share/python3/dist/python3-arvados-cwl-runner/lib/python3.7/site-packages/cwltool/workflow.py", line 106, in __init__ 2021-10-12T19:16:20.481902684Z stderr step, index, loadingContext, loadingContext.prov_obj 2021-10-12T19:16:20.481902684Z stderr File "/usr/share/python3/dist/python3-arvados-cwl-runner/lib/python3.7/site-packages/arvados_cwl/arvworkflow.py", line 352, in make_workflow_step 2021-10-12T19:16:20.481902684Z stderr return ArvadosWorkflowStep(toolpath_object, pos, loadingContext, self.arvrunner, *argc, **argv) 2021-10-12T19:16:20.481902684Z stderr File "/usr/share/python3/dist/python3-arvados-cwl-runner/lib/python3.7/site-packages/arvados_cwl/arvworkflow.py", line 145, in __init__ 2021-10-12T19:16:20.481902684Z stderr super(ArvadosWorkflowStep, self).__init__(toolpath_object, pos, loadingContext, *argc, **argv) 2021-10-12T19:16:20.481902684Z stderr File "/usr/share/python3/dist/python3-arvados-cwl-runner/lib/python3.7/site-packages/cwltool/workflow.py", line 241, in __init__ 2021-10-12T19:16:20.481902684Z stderr ) from vexc 2021-10-12T19:16:20.481902684Z stderr cwltool.errors.WorkflowException: Tool definition file:///var/lib/cwl/workflow.json#16169-step.cwl failed validation: 2021-10-12T19:16:20.481902684Z stderr ../../lib/cwl/workflow.json:1:12: checking object `../../lib/cwl/workflow.json#main` 2021-10-12T19:16:20.481902684Z stderr ../../lib/cwl/workflow.json:1:170: checking field `requirements` 2021-10-12T19:16:20.481902684Z stderr ../../lib/cwl/workflow.json:1:186: checking item 2021-10-12T19:16:20.481902684Z stderr ../../lib/cwl/workflow.json:1:187: Field `class` contains undefined reference to 2021-10-12T19:16:20.481902684Z stderr `http://commonwl.org/cwltool#LoadListingRequirement` 2021-10-12T19:16:20.903520059Z crunch-run Complete ERROR Overall process status is permanentFail INFO Final output collection 5551f5d7ec57a8aa9758d14a976e87e9+57 INFO Output at https://workbench2.9tee4.arvadosapi.com/collections/5551f5d7ec57a8aa9758d14a976e87e9+57 {} WARNING Final process status is permanentFail
Updated by Peter Amstutz about 3 years ago
- Target version changed from 2021-10-13 sprint to 2021-10-27 sprint
Updated by Ward Vandewege about 3 years ago
Curiously, this works fine when run with --local, which is going to make this a bit harder to debug:
(venv) ci@shell:~/arvados/sdk/cwl/tests$ arvados-cwl-runner --disable-reuse --compute-checksum --api=containers --outdir=/tmp/tmpwn7jhrb1 --debug --submit-runner-image arvados/jobs:2.3.0.dev20211006204117 --local 16169-no-listing-hint.cwl listing-job.yml INFO /home/ci/arvados/sdk/cwl/tests/venv/bin/arvados-cwl-runner 2.3.0.dev20211013202728, arvados-python-client 2.2.2, cwltool 3.1.20210922203925 INFO Resolved '16169-no-listing-hint.cwl' to 'file:///home/ci/arvados/sdk/cwl/tests/16169-no-listing-hint.cwl' INFO Using cluster 9tee4 (https://workbench2.9tee4.arvadosapi.com/) INFO Upload local files: "tmp1" INFO Using collection 1c874bd4fe26056034be8a06e4c3133b+65 (9tee4-4zz18-8ro3shrbetz8wvc) INFO Using collection cache size 256 MiB DEBUG ENTER jobiter 1634158594.4664106 DEBUG [workflow 16169-no-listing-hint.cwl] initialized from file:///home/ci/arvados/sdk/cwl/tests/16169-no-listing-hint.cwl DEBUG EXIT jobiter 1634158594.4677873 0.0013766288757324219 DEBUG ENTER run 1634158594.4678693 INFO [workflow 16169-no-listing-hint.cwl] start DEBUG EXIT run 1634158594.4680102 0.00014090538024902344 DEBUG ENTER jobiter 1634158594.4680834 DEBUG [workflow 16169-no-listing-hint.cwl] inputs { "d": { "class": "Directory", "location": "keep:1c874bd4fe26056034be8a06e4c3133b+65/tmp1", "basename": "tmp1" } } INFO [workflow 16169-no-listing-hint.cwl] starting step step1 DEBUG [step step1] job input { "file:///home/ci/arvados/sdk/cwl/tests/16169-no-listing-hint.cwl#step1/d": { "class": "Directory", "location": "keep:1c874bd4fe26056034be8a06e4c3133b+65/tmp1", "basename": "tmp1" } } DEBUG [step step1] evaluated job input to { "file:///home/ci/arvados/sdk/cwl/tests/16169-no-listing-hint.cwl#step1/d": { "class": "Directory", "location": "keep:1c874bd4fe26056034be8a06e4c3133b+65/tmp1", "basename": "tmp1" } } INFO [step step1] start DEBUG [job step1] initializing from file:///home/ci/arvados/sdk/cwl/tests/wf/16169-step.cwl as part of step step1 DEBUG [job step1] { "d": { "class": "Directory", "location": "keep:1c874bd4fe26056034be8a06e4c3133b+65/tmp1", "basename": "tmp1" } } DEBUG [job step1] path mappings is { "keep:1c874bd4fe26056034be8a06e4c3133b+65/tmp1": [ "keep:1c874bd4fe26056034be8a06e4c3133b+65/tmp1", "/keep/1c874bd4fe26056034be8a06e4c3133b+65/tmp1", "Directory", true ] } DEBUG [job step1] command line bindings is [ { "position": [ 0, 0 ], "datum": "echo" }, { "position": [ 0, 1 ], "valueFrom": "${if(inputs.d.listing === undefined) {return 'true';} else {return 'false';}}" } ] DEBUG EXIT jobiter 1634158596.1322594 1.6641759872436523 DEBUG ENTER run 1634158596.1329393 DEBUG ENTER generatefiles step1 1634158596.133355 DEBUG EXIT generatefiles step1 1634158596.1334329 7.796287536621094e-05 INFO [container step1] 9tee4-xvhdp-w6eyxafpak1vhm6 state is Committed DEBUG EXIT run 1634158596.2429066 0.10996723175048828 DEBUG ENTER jobiter 1634158596.2429976 DEBUG EXIT jobiter 1634158596.243226 0.00022840499877929688 DEBUG ENTER jobiter 1634158599.2435837 DEBUG EXIT jobiter 1634158599.2438827 0.0002989768981933594 DEBUG ENTER jobiter 1634158602.2441397 DEBUG EXIT jobiter 1634158602.2444339 0.0002942085266113281 DEBUG ENTER jobiter 1634158605.2447248 DEBUG EXIT jobiter 1634158605.245025 0.0003001689910888672 INFO [container step1] 9tee4-xvhdp-w6eyxafpak1vhm6 is Final DEBUG Creating collection reader for ae755cd1b3cff63152ff4200f4dea7e9+52 DEBUG {'9tee4-bi6l4-uwgvqzzw4z97qca': OrderedDict([('href', '/keep_services/9tee4-bi6l4-uwgvqzzw4z97qca'), ('kind', 'arvados#keepService'), ('etag', 'c65xiys117q01qz9s7zvjt66i'), ('uuid', '9tee4-bi6l4-uwgvqzzw4z97qca'), ('owner_uuid', '9tee4-tpzed-ruo0j78b8s7jk57'), ('created_at', '2014-08-27T02:19:29.655913000Z'), ('modified_by_client_uuid', '9tee4-ozdt8-yft7aixdt2dx5ni'), ('modified_by_user_uuid', '9tee4-tpzed-ruo0j78b8s7jk57'), ('modified_at', '2014-08-27T02:20:21.044273000Z'), ('service_host', 'keep0.9tee4.arvadosapi.com'), ('service_port', 25107), ('service_ssl_flag', False), ('service_type', 'disk'), ('read_only', False), ('_service_root', 'http://keep0.9tee4.arvadosapi.com:25107/')]), '9tee4-bi6l4-1fum8ggyzc88g87': OrderedDict([('href', '/keep_services/9tee4-bi6l4-1fum8ggyzc88g87'), ('kind', 'arvados#keepService'), ('etag', 'dn6g7rkwx5gpwq0h053wyquax'), ('uuid', '9tee4-bi6l4-1fum8ggyzc88g87'), ('owner_uuid', '9tee4-tpzed-ruo0j78b8s7jk57'), ('created_at', '2014-08-27T02:20:27.578073000Z'), ('modified_by_client_uuid', '9tee4-ozdt8-yft7aixdt2dx5ni'), ('modified_by_user_uuid', '9tee4-tpzed-ruo0j78b8s7jk57'), ('modified_at', '2014-08-27T02:20:43.978890000Z'), ('service_host', 'keep1.9tee4.arvadosapi.com'), ('service_port', 25107), ('service_ssl_flag', False), ('service_type', 'disk'), ('read_only', False), ('_service_root', 'http://keep1.9tee4.arvadosapi.com:25107/')])} DEBUG 74d9a83219cabaab06a69fd318873f33+5+A05437ebe0dadd0b1a34018fde4bd5110bd6d1072@6179bd0e: ['http://keep1.9tee4.arvadosapi.com:25107/', 'http://keep0.9tee4.arvadosapi.com:25107/'] DEBUG Request: GET http://keep1.9tee4.arvadosapi.com:25107/74d9a83219cabaab06a69fd318873f33+5+A05437ebe0dadd0b1a34018fde4bd5110bd6d1072@6179bd0e INFO GET 200: 5 bytes in 3.016233444213867 msec (0.002 MiB/sec) DEBUG [step step1] produced output { "file:///home/ci/arvados/sdk/cwl/tests/16169-no-listing-hint.cwl#step1/out": { "location": "keep:ae755cd1b3cff63152ff4200f4dea7e9+52/output.txt", "basename": "output.txt", "nameroot": "output", "nameext": ".txt", "class": "File", "checksum": "sha1$724ba28f4a9a1b472057ff99511ed393a45552e1", "size": 5 } } INFO [step step1] completed success INFO [workflow 16169-no-listing-hint.cwl] completed success DEBUG [workflow 16169-no-listing-hint.cwl] outputs { "out": { "location": "keep:ae755cd1b3cff63152ff4200f4dea7e9+52/output.txt", "basename": "output.txt", "nameroot": "output", "nameext": ".txt", "class": "File", "checksum": "sha1$724ba28f4a9a1b472057ff99511ed393a45552e1", "size": 5 } } INFO Overall process status is success DEBUG ENTER jobiter 1634158606.5850677 DEBUG EXIT jobiter 1634158606.5854146 0.00034689903259277344 DEBUG f906f5732e2d841ebd85c2e898b7fd85+202: ['http://keep1.9tee4.arvadosapi.com:25107/', 'http://keep0.9tee4.arvadosapi.com:25107/'] DEBUG Pool max threads is 2 DEBUG Request: PUT http://keep1.9tee4.arvadosapi.com:25107/f906f5732e2d841ebd85c2e898b7fd85 DEBUG Request: PUT http://keep0.9tee4.arvadosapi.com:25107/f906f5732e2d841ebd85c2e898b7fd85 INFO PUT 200: 202 bytes in 21.41737937927246 msec (0.009 MiB/sec) DEBUG KeepWriterThread <KeepWriterThread(Thread-6, started daemon 139671234782976)> succeeded f906f5732e2d841ebd85c2e898b7fd85+202 http://keep0.9tee4.arvadosapi.com:25107/ INFO PUT 200: 202 bytes in 27.50682830810547 msec (0.007 MiB/sec) DEBUG KeepWriterThread <KeepWriterThread(Thread-5, started daemon 139671246845696)> succeeded f906f5732e2d841ebd85c2e898b7fd85+202 http://keep1.9tee4.arvadosapi.com:25107/ INFO Final output collection b82af5190efb732119bb6014967d961b+113 "Output of 16169-no-listing-hint.cwl (2021-10-13T20:56:46.637Z)" (9tee4-4zz18-uaywkwyo6anj8r9) { "out": { "location": "keep:b82af5190efb732119bb6014967d961b+113/output.txt", "basename": "output.txt", "class": "File", "checksum": "sha1$724ba28f4a9a1b472057ff99511ed393a45552e1", "size": 5 } } INFO Final process status is success
Updated by Ward Vandewege about 3 years ago
The problem appears to be that the generated /var/lib/workflow.json
specifies cwlVersion: 1.0
, but LoadListingRequirement
was only added in CWL 1.1 (cf https://www.commonwl.org/v1.1/CommandLineTool.html).
Why is this happening, though? And why only on 9tee4?
This is workflow.json, as seen from the perspective of cwltool/load_tool.py
in the resolve_and_validate_document
function:
{ "$graph": [ { "class": "Workflow", "hints": [ { "class": "DockerRequirement", "dockerPull": "arvados/jobs:2.2.2", "http://arvados.org/cwl#dockerCollectionPDH": "cb1f184dc5600118c2d2cb873b6771f6+261" } ], "id": "#main", "inputs": [ { "id": "#main/d", "type": "Directory" } ], "outputs": [ { "id": "#main/out", "outputSource": "#main/step1/out", "type": "File" } ], "requirements": [ { "class": "http://commonwl.org/cwltool#LoadListingRequirement", "loadListing": "no_listing" } ], "steps": [ { "id": "#main/step1", "in": [ { "id": "#main/step1/d", "source": "#main/d" } ], "out": [ "#main/step1/out" ], "run": "#16169-step.cwl" } ] }, { "arguments": [ "echo", "${if(inputs.d.listing === undefined) {return 'true';} else {return 'false';}}" ], "class": "CommandLineTool", "id": "#16169-step.cwl", "inputs": [ { "id": "#16169-step.cwl/d", "type": "Directory" } ], "outputs": [ { "id": "#16169-step.cwl/out", "outputBinding": { "glob": "output.txt" }, "type": "File" } ], "requirements": [ { "class": "DockerRequirement", "dockerPull": "debian:buster-slim", "http://arvados.org/cwl#dockerCollectionPDH": "d2903f206785156adcec982c5091d44a+175" }, { "class": "InlineJavascriptRequirement" } ], "stdout": "output.txt" } ], "$namespaces": { "arv": "http://arvados.org/cwl#", "cwltool": "http://commonwl.org/cwltool#" }, "cwlVersion": "v1.0" }
Updated by Peter Amstutz about 3 years ago
cwltool:LoadListingRequirement expands to http://commonwl.org/cwltool#LoadListingRequirement
It's an process requirement extension that was added to be able to use the feature in v1.0. It was added to the spec in v1.1. The purpose of this test is to check that the extension version of feature is still available for use in v1.0 workflows.
Arvados-cwl-runner implements this by loading a custom schema that includes the extension. That is also supposed to enable validation.
The validation is checking that the class
field refers to a known entity (e.g. cwltool:LoadListingRequirement)
The specific function that does the checking is schema_salad/ref_resolver.py:1036 validate_link()
It checks a few things, the vocabulary (CWL keywords), the "reverse vocabulary" (certain keywords can be expanded to URIs but are supposed to mean the same thing) and the document index (identifiers that are declared in the document itself).
I'm actually not entirely certain which one of those tables the string "http://commonwl.org/cwltool#LoadListingRequirement" would normally appear in, so that could be a clue.
Updated by Peter Amstutz about 3 years ago
Ok, here's what I think is going on here.
- At some point, I removed cwltool:LoadListingRequirement from the Arvados v1.2 extensions. This was on the assumption that it was redundant because the internal updater, upon loading a v1.0 file, will internally update cwltool:LoadListingRequirement extension to the standard LoadListingRequirement.
- For some reason, it is validating the document using the v1.2 schema, but cwltool:LoadListingRequirement appears instead of LoadListingRequirement
- The link checking is simpleminded and doesn't really know if something is just an identifier or a link to an actual resource (e.g. you can have "run: http://example.com/tool.cwl") so it goes and tries to make an HTTP HEAD request to determine if something exists
- https://www.commonwl.org/cwltool#LoadListingRequirement returns a 404 page. schema salad calls raise_for_status() which is expected to raise an error if it gets 404 (but apparently doesn't?)
- Since the link check doesn't fail, it accepts https://www.commonwl.org/cwltool#LoadListingRequirement as a valid identifier and validation passes
- Meanwhile, suppose that 9tee4 allows connections to the API server but disallows outgoing connections to the public internet from its compute nodes. This would cause it to raise a connection error, which reports an invalid identifier, which results in the validation error. This would explain why the behavior is inconsistent between 9tee4 and other clusters.
Questions:
- Why isn't cwltool:LoadListingRequirement converted to the LoadListingRequirement from the spec?
- Why doesn't raise_for_status() raise an exception?
Updated by Ward Vandewege about 3 years ago
- Assigned To changed from Ward Vandewege to Peter Amstutz
- Status changed from New to In Progress
Updated by Ward Vandewege about 3 years ago
commonwl.org is now on the dns resolution whitelist for 9tee4's compute nodes (to deal with a "networkaccess enabled" test, #224 in the cwl 1.2 conformance test suite). This will affect testing for this bug, beware!
Updated by Peter Amstutz about 3 years ago
Answers to previous questions:
- cwltool:LoadListingRequirement appears because it is loading un-updated code. The code is not updated because a different fragment of the same file was updated, but when it does that, it marks the entire file as being updated, without having updated the other fragments in the file
- raise_for_status does not raise an error because it is not following redirects, and there is a blanket http->https redirect, as a result it gets a 3xx redirect which would have led to a 404, but since it stopped at the 3xx it didn't raise an error.
The 1st issue is a cwltool bug. I think I have a solution but I need to write a test that demonstrates the bug to confirm the fix.
The 2nd issue is a schema salad bug. I've made a PR with a fix (just adds allow_redirects) but that also needs a test.
Updated by Peter Amstutz about 3 years ago
A little more detail.
- loading #main fragment
- resolve_and_validate_document gets the root document and puts it into "workflowobj"
- it loads the 1.0 schema which does contain cwltool:LoadListingRequirement
- it calls resolve_all() on workflowobj, this passes
- then it calls resolve_ref() to get the fragment we're interested in
- then it checks to see if the fragment needs to be updated
- it updates the fragment and replaces the fragment in the "index" with the updated one
- it updates document "metadata" with the new version ** this is where things start to go wrong
- it is noting in the document was updated in metadata, but it didn't update the whole document, only the fragment
- the "metadata" dict is actually the index entry of the old un-updated document
later:
- loading a different fragment
- resolve_and_validate_document gets the root document and puts it into "workflowobj"
- the document is marked as being v1.2 from when the #main fragment was updated
- however what's actually in the index wasn't updated, it's v1.0 by accident
- it loads the 1.2 schema which doesn't contain cwltool:LoadListingRequirement
- it calls resolve_all() on workflowobj, this fails
Updated by Peter Amstutz about 3 years ago
- Status changed from In Progress to Resolved