Actions
Bug #18311
open[cwl] test 221 in the 1.2 conformance suite is failing on singularity
Story points:
-
Release:
Release relationship:
Auto
Description
On our singularity clusters (9tee4 and tordo, currently), the failure is (from run-tests-cwl-suite: #37 /console):
21:57:52 Test 221 failed: /usr/bin/arvados-cwl-runner --compute-checksum --disable-reuse --eval-timeout 60 --outdir=/tmp/tmpxpfu_s_d --quiet tests/timelimit2-wf.cwl tests/empty.json 21:57:52 Test that workflow level time limit is not applied to workflow execution time 21:57:52 Returned non-zero 21:57:52 ERROR [container timelimit2-wf.cwl] (9tee4-dz642-j9qil6yio4ms1ku) error log: 21:57:52 21:57:52 2021-10-29T01:56:31.148851065Z crunch-run Loading Docker image from keep 21:57:52 2021-10-29T01:56:31.641354946Z crunch-run Starting container 21:57:52 2021-10-29T01:56:31.642351277Z crunch-run Waiting for container to finish 21:57:52 2021-10-29T01:56:37.397845753Z stderr INFO /usr/bin/arvados-cwl-runner 2.3.0.dev20211020182823, arvados-python-client 2.3.0.dev20211013202728, cwltool 3.1.20211020155521 21:57:52 2021-10-29T01:56:37.414865353Z stderr INFO Resolved '/var/lib/cwl/workflow.json#main' to 'file:///var/lib/cwl/workflow.json#main' 21:57:52 2021-10-29T01:56:45.046750708Z stderr INFO Using cluster 9tee4 (https://workbench2.9tee4.arvadosapi.com/) 21:57:52 2021-10-29T01:56:45.558323744Z stderr INFO Upload local files: "workflow.json" 21:57:52 2021-10-29T01:56:45.758393815Z stderr INFO Uploaded to f9d6a1cda3285e35b4b70234c4712b95+61 (9tee4-4zz18-wn85zp8etf12ora) 21:57:52 2021-10-29T01:56:45.763297415Z stderr INFO Upload local files: "workflow.json" 21:57:52 2021-10-29T01:56:45.887338055Z stderr INFO Uploaded to f9d6a1cda3285e35b4b70234c4712b95+61 (9tee4-4zz18-35bm0g8ul56y4no) 21:57:52 2021-10-29T01:56:50.325226086Z stderr INFO Using collection cache size 256 MiB 21:57:52 2021-10-29T01:56:50.574797297Z stderr INFO Running inside container 9tee4-dz642-j9qil6yio4ms1ku 21:57:52 2021-10-29T01:56:50.596940754Z stderr INFO [workflow workflow.json#main] start 21:57:52 2021-10-29T01:56:50.597164015Z stderr INFO [workflow workflow.json#main] starting step step1 21:57:52 2021-10-29T01:56:50.597407019Z stderr INFO [step step1] start 21:57:52 2021-10-29T01:56:50.759170131Z stderr INFO [container step1] 9tee4-xvhdp-njas6p46cvav01s state is Committed 21:57:52 2021-10-29T01:57:38.633382081Z stderr INFO [container step1] 9tee4-xvhdp-njas6p46cvav01s is Final 21:57:52 2021-10-29T01:57:38.720112167Z stderr ERROR [container step1] (9tee4-xvhdp-njas6p46cvav01s) error log: 21:57:52 2021-10-29T01:57:38.720112167Z stderr 21:57:52 2021-10-29T01:57:38.720112167Z stderr 2021-10-29T01:57:20.484569438Z crunch-run Not starting a gateway server (GatewayAuthSecret was not provided by dispatcher) 21:57:52 2021-10-29T01:57:38.720112167Z stderr 2021-10-29T01:57:20.484755292Z crunch-run crunch-run 2.3.0 (go1.17.1) started 21:57:52 2021-10-29T01:57:38.720112167Z stderr 2021-10-29T01:57:20.484776349Z crunch-run Executing container '9tee4-dz642-szbmvhrdakmb0ha' using singularity runtime 21:57:52 2021-10-29T01:57:38.720112167Z stderr 2021-10-29T01:57:20.484839635Z crunch-run Executing on host 'compute1.9tee4.arvadosapi.com' 21:57:52 2021-10-29T01:57:38.720112167Z stderr 2021-10-29T01:57:20.604062194Z crunch-run container token "v2/9tee4-gj3su-eaaf5ru6ohnw7d2/36sq5h3y5f0qo1s7qlgzrn9hfn4h851ufstfkpjku5sh76i92j/9tee4-dz642-szbmvhrdakmb0ha" 21:57:52 2021-10-29T01:57:38.720112167Z stderr 2021-10-29T01:57:20.604359220Z crunch-run Running [arv-mount --foreground --read-write --storage-classes default --crunchstat-interval=10 --file-cache 268435456 --mount-by-pdh by_id --mount-by-id by_uuid /tmp/crunch-run.9tee4-dz642-szbmvhrdakmb0ha.3491659512/keep4170174168] 21:57:52 2021-10-29T01:57:38.720112167Z stderr 2021-10-29T01:57:21.510019953Z crunch-run Fetching Docker image from collection 'd2a6f06e1f8e3e7d72b8ee89622e9f96+261' 21:57:52 2021-10-29T01:57:38.720112167Z stderr 2021-10-29T01:57:21.576950500Z crunch-run Using Docker image id "sha256:61064933b465210ec06517e95e64b0909841d4ccf037552266d7079baed45a6e" 21:57:52 2021-10-29T01:57:38.720112167Z stderr 2021-10-29T01:57:21.576984357Z crunch-run Loading Docker image from keep 21:57:52 2021-10-29T01:57:38.720112167Z stderr 2021-10-29T01:57:22.013919659Z crunch-run Starting container 21:57:52 2021-10-29T01:57:38.720112167Z stderr 2021-10-29T01:57:22.014675984Z crunch-run Waiting for container to finish 21:57:52 2021-10-29T01:57:38.720112167Z stderr 2021-10-29T01:57:27.015064319Z crunch-run maximum run time exceeded. Stopping container. 21:57:52 2021-10-29T01:57:38.720112167Z stderr 2021-10-29T01:57:27.015148637Z crunch-run stopping container 21:57:52 2021-10-29T01:57:38.720112167Z stderr 2021-10-29T01:57:27.123407016Z crunch-run Cancelled 21:57:52 2021-10-29T01:57:38.817582951Z stderr ERROR [step step1] Output is missing expected field file:///var/lib/cwl/workflow.json#main/step1/o 21:57:52 2021-10-29T01:57:38.900999492Z stderr WARNING [step step1] completed permanentFail 21:57:52 2021-10-29T01:57:38.976870679Z stderr INFO [workflow workflow.json#main] completed permanentFail 21:57:52 2021-10-29T01:57:38.976917303Z stderr ERROR Overall process status is permanentFail 21:57:52 2021-10-29T01:57:39.142930716Z stderr INFO Final output collection 42866e7c2f47b2fd38dca68903e59c4b+59 "Output of main (2021-10-29T01:57:39.107Z)" (9tee4-4zz18-yr16og3bk6qwk3q) 21:57:52 2021-10-29T01:57:39.374321052Z stderr WARNING Final process status is permanentFail 21:57:52 2021-10-29T01:57:39.956477426Z crunch-run Complete 21:57:52 ERROR Overall process status is permanentFail 21:57:52 WARNING Final process status is permanentFail
And on 9tee4 (from run-tests-cwl-suite: #40 /console):
11:24:02 Test 221 failed: /usr/bin/arvados-cwl-runner --compute-checksum --disable-reuse --eval-timeout 60 --outdir=/tmp/tmp_lnu_8e_ --quiet tests/timelimit2-wf.cwl tests/empty.json 11:24:02 Test that workflow level time limit is not applied to workflow execution time 11:24:02 Returned non-zero 11:24:02 ERROR [container timelimit2-wf.cwl] (9tee4-dz642-tggdcuakup1jfq5) error log: 11:24:02 11:24:02 2021-10-29T15:22:53.175747990Z crunch-run Loading Docker image from keep 11:24:02 2021-10-29T15:22:53.849775581Z crunch-run Starting container 11:24:02 2021-10-29T15:22:53.850902104Z crunch-run Waiting for container to finish 11:24:02 2021-10-29T15:23:05.619135487Z stderr INFO /usr/bin/arvados-cwl-runner 2.3.0.dev20211020182823, arvados-python-client 2.3.0.dev20211013202728, cwltool 3.1.20211020155521 11:24:02 2021-10-29T15:23:05.636148789Z stderr INFO Resolved '/var/lib/cwl/workflow.json#main' to 'file:///var/lib/cwl/workflow.json#main' 11:24:02 2021-10-29T15:23:14.022887185Z stderr INFO Using cluster 9tee4 (https://workbench2.9tee4.arvadosapi.com/) 11:24:02 2021-10-29T15:23:14.279142453Z stderr INFO Upload local files: "workflow.json" 11:24:02 2021-10-29T15:23:14.428674656Z stderr INFO Uploaded to f9d6a1cda3285e35b4b70234c4712b95+61 (9tee4-4zz18-9zab9j0xsgvigar) 11:24:02 2021-10-29T15:23:14.435330631Z stderr INFO Upload local files: "workflow.json" 11:24:02 2021-10-29T15:23:14.571627543Z stderr INFO Uploaded to f9d6a1cda3285e35b4b70234c4712b95+61 (9tee4-4zz18-gjzewquov369heg) 11:24:02 2021-10-29T15:23:19.004648013Z stderr INFO Using collection cache size 256 MiB 11:24:02 2021-10-29T15:23:19.030485917Z stderr INFO Running inside container 9tee4-dz642-tggdcuakup1jfq5 11:24:02 2021-10-29T15:23:19.052813153Z stderr INFO [workflow workflow.json#main] start 11:24:02 2021-10-29T15:23:19.053081383Z stderr INFO [workflow workflow.json#main] starting step step1 11:24:02 2021-10-29T15:23:19.053338000Z stderr INFO [step step1] start 11:24:02 2021-10-29T15:23:19.230789355Z stderr INFO [container step1] 9tee4-xvhdp-qojf0j4ca8lyvnf state is Committed 11:24:02 2021-10-29T15:23:55.085180501Z stderr INFO [container step1] 9tee4-xvhdp-qojf0j4ca8lyvnf is Final 11:24:02 2021-10-29T15:23:55.176969339Z stderr ERROR [container step1] (9tee4-xvhdp-qojf0j4ca8lyvnf) error log: 11:24:02 2021-10-29T15:23:55.176969339Z stderr 11:24:02 2021-10-29T15:23:55.176969339Z stderr 2021-10-29T15:23:45.345522001Z crunch-run Not starting a gateway server (GatewayAuthSecret was not provided by dispatcher) 11:24:02 2021-10-29T15:23:55.176969339Z stderr 2021-10-29T15:23:45.345759158Z crunch-run crunch-run 2.3.0 (go1.17.1) started 11:24:02 2021-10-29T15:23:55.176969339Z stderr 2021-10-29T15:23:45.345779799Z crunch-run Executing container '9tee4-dz642-e1vpz4lmmpzjdr4' using singularity runtime 11:24:02 2021-10-29T15:23:55.176969339Z stderr 2021-10-29T15:23:45.345801566Z crunch-run Executing on host 'compute1.9tee4.arvadosapi.com' 11:24:02 2021-10-29T15:23:55.176969339Z stderr 2021-10-29T15:23:45.459713326Z crunch-run container token "v2/9tee4-gj3su-ewn10p08oeckpxj/1eqyw1uioulmxl78olk0i743ipt4nro0neksh32ee48ks8rjux/9tee4-dz642-e1vpz4lmmpzjdr4" 11:24:02 2021-10-29T15:23:55.176969339Z stderr 2021-10-29T15:23:45.460034932Z crunch-run Running [arv-mount --foreground --read-write --storage-classes default --crunchstat-interval=10 --file-cache 268435456 --mount-by-pdh by_id --mount-by-id by_uuid /tmp/crunch-run.9tee4-dz642-e1vpz4lmmpzjdr4.734327895/keep1570108756] 11:24:02 2021-10-29T15:23:55.176969339Z stderr 2021-10-29T15:23:46.365999816Z crunch-run Fetching Docker image from collection 'd2a6f06e1f8e3e7d72b8ee89622e9f96+261' 11:24:02 2021-10-29T15:23:55.176969339Z stderr 2021-10-29T15:23:46.430319132Z crunch-run Using Docker image id "sha256:61064933b465210ec06517e95e64b0909841d4ccf037552266d7079baed45a6e" 11:24:02 2021-10-29T15:23:55.176969339Z stderr 2021-10-29T15:23:46.430387977Z crunch-run Loading Docker image from keep 11:24:02 2021-10-29T15:23:55.176969339Z stderr 2021-10-29T15:23:47.011343535Z crunch-run Starting container 11:24:02 2021-10-29T15:23:55.176969339Z stderr 2021-10-29T15:23:47.012153992Z crunch-run Waiting for container to finish 11:24:02 2021-10-29T15:23:55.176969339Z stderr 2021-10-29T15:23:52.012532501Z crunch-run maximum run time exceeded. Stopping container. 11:24:02 2021-10-29T15:23:55.176969339Z stderr 2021-10-29T15:23:52.012641954Z crunch-run stopping container 11:24:02 2021-10-29T15:23:55.176969339Z stderr 2021-10-29T15:23:52.123251411Z crunch-run Cancelled 11:24:02 2021-10-29T15:23:55.259792247Z stderr ERROR [step step1] Output is missing expected field file:///var/lib/cwl/workflow.json#main/step1/o 11:24:02 2021-10-29T15:23:55.344468592Z stderr WARNING [step step1] completed permanentFail 11:24:02 2021-10-29T15:23:55.423478553Z stderr INFO [workflow workflow.json#main] completed permanentFail 11:24:02 2021-10-29T15:23:55.423519944Z stderr ERROR Overall process status is permanentFail 11:24:02 2021-10-29T15:23:55.572398801Z stderr INFO Final output collection 42866e7c2f47b2fd38dca68903e59c4b+59 "Output of main (2021-10-29T15:23:55.541Z)" (9tee4-4zz18-smx5y5kzo0oyh11) 11:24:02 2021-10-29T15:23:55.804266214Z stderr WARNING Final process status is permanentFail 11:24:02 2021-10-29T15:23:56.277148741Z crunch-run Complete 11:24:02 ERROR Overall process status is permanentFail 11:24:02 WARNING Final process status is permanentFail
Compare with ce8i5 where it passes (run-tests-cwl-suite: #38 ).
Updated by Peter Amstutz about 3 years ago
- Target version changed from 2021-11-10 sprint to 2021-11-24 sprint
Updated by Peter Amstutz about 3 years ago
- Target version changed from 2021-11-24 sprint to 2021-12-08 sprint
Updated by Peter Amstutz about 3 years ago
- Target version changed from 2021-12-08 sprint to 2022-01-05 sprint
Updated by Peter Amstutz almost 3 years ago
- Target version changed from 2022-01-05 sprint to 2022-01-19 sprint
Updated by Peter Amstutz almost 3 years ago
- Target version changed from 2022-01-19 sprint to 2022-02-02 sprint
Updated by Peter Amstutz almost 3 years ago
- Target version changed from 2022-02-02 sprint to 2022-02-16 sprint
Updated by Peter Amstutz almost 3 years ago
- Target version changed from 2022-02-16 sprint to 2022-03-02 sprint
Updated by Peter Amstutz almost 3 years ago
- Target version changed from 2022-03-02 sprint to 2022-03-16 sprint
Updated by Peter Amstutz almost 3 years ago
- Target version changed from 2022-03-16 sprint to 2022-03-30 Sprint
Updated by Peter Amstutz almost 3 years ago
- Target version changed from 2022-03-30 Sprint to 2022-04-13 Sprint
Updated by Peter Amstutz over 2 years ago
- Target version changed from 2022-04-13 Sprint to 2022-04-27 Sprint
Updated by Peter Amstutz over 2 years ago
- Target version changed from 2022-04-27 Sprint to 2022-05-11 sprint
Updated by Peter Amstutz over 2 years ago
I looked at this a little closer, I think the problem is that the test has a very short timeout (3 seconds) and that it is taking longer than that to start the singularity container, compared to Docker. (This is because the Docker image is downloaded and started before the container is started, whereas singularity runs from a compressed image on the keep mount, so it has more startup latency). I changed it to skip this test.
Updated by Peter Amstutz over 2 years ago
- Target version changed from 2022-05-11 sprint to 2022-05-25 sprint
Updated by Peter Amstutz over 2 years ago
- Target version changed from 2022-05-25 sprint to 2022-06-08 sprint
Updated by Peter Amstutz over 2 years ago
- Target version changed from 2022-06-08 sprint to 2022-06-22 Sprint
Updated by Peter Amstutz over 2 years ago
Todo: increase the timeouts on this test in CWL 1.2.1
Updated by Peter Amstutz over 2 years ago
- Target version changed from 2022-06-22 Sprint to 2022-07-06
Updated by Peter Amstutz over 2 years ago
- Target version changed from 2022-07-06 to 2022-07-20
Updated by Peter Amstutz over 2 years ago
- Target version changed from 2022-07-20 to 2022-08-03 Sprint
Updated by Peter Amstutz over 2 years ago
- Target version changed from 2022-08-03 Sprint to 2022-08-17 sprint
Updated by Peter Amstutz over 2 years ago
- Target version changed from 2022-08-17 sprint to 2022-08-31 sprint
Updated by Peter Amstutz over 2 years ago
- Target version changed from 2022-08-31 sprint to 2022-09-28 sprint
Updated by Peter Amstutz over 2 years ago
- Target version changed from 2022-09-28 sprint to 2022-10-12 sprint
Updated by Peter Amstutz about 2 years ago
- Target version changed from 2022-10-12 sprint to 2022-11-09 sprint
Updated by Peter Amstutz about 2 years ago
- Target version changed from 2022-11-09 sprint to 2022-11-23 sprint
Updated by Peter Amstutz about 2 years ago
- Target version changed from 2022-11-23 sprint to 2022-12-07 Sprint
Updated by Peter Amstutz about 2 years ago
- Target version changed from 2022-12-07 Sprint to 2022-12-21 Sprint
Updated by Peter Amstutz about 2 years ago
- Target version deleted (
2022-12-21 Sprint)
Actions