Project

General

Profile

Bug #18311

Updated by Ward Vandewege 8 months ago

On our singularity clusters (9tee4 and tordo, currently), the failure is (from https://ci.arvados.org/view/Arvados%20build%20pipeline/job/run-tests-cwl-suite/37/console): 

 <pre> 
 21:57:52 Test 221 failed: /usr/bin/arvados-cwl-runner --compute-checksum --disable-reuse --eval-timeout 60 --outdir=/tmp/tmpxpfu_s_d --quiet tests/timelimit2-wf.cwl tests/empty.json 
 21:57:52 Test that workflow level time limit is not applied to workflow execution time 
 21:57:52 Returned non-zero 
 21:57:52 ERROR [container timelimit2-wf.cwl] (9tee4-dz642-j9qil6yio4ms1ku) error log: 
 21:57:52  
 21:57:52     2021-10-29T01:56:31.148851065Z crunch-run Loading Docker image from keep 
 21:57:52     2021-10-29T01:56:31.641354946Z crunch-run Starting container 
 21:57:52     2021-10-29T01:56:31.642351277Z crunch-run Waiting for container to finish 
 21:57:52     2021-10-29T01:56:37.397845753Z stderr INFO /usr/bin/arvados-cwl-runner 2.3.0.dev20211020182823, arvados-python-client 2.3.0.dev20211013202728, cwltool 3.1.20211020155521 
 21:57:52     2021-10-29T01:56:37.414865353Z stderr INFO Resolved '/var/lib/cwl/workflow.json#main' to 'file:///var/lib/cwl/workflow.json#main' 
 21:57:52     2021-10-29T01:56:45.046750708Z stderr INFO Using cluster 9tee4 (https://workbench2.9tee4.arvadosapi.com/) 
 21:57:52     2021-10-29T01:56:45.558323744Z stderr INFO Upload local files: "workflow.json" 
 21:57:52     2021-10-29T01:56:45.758393815Z stderr INFO Uploaded to f9d6a1cda3285e35b4b70234c4712b95+61 (9tee4-4zz18-wn85zp8etf12ora) 
 21:57:52     2021-10-29T01:56:45.763297415Z stderr INFO Upload local files: "workflow.json" 
 21:57:52     2021-10-29T01:56:45.887338055Z stderr INFO Uploaded to f9d6a1cda3285e35b4b70234c4712b95+61 (9tee4-4zz18-35bm0g8ul56y4no) 
 21:57:52     2021-10-29T01:56:50.325226086Z stderr INFO Using collection cache size 256 MiB 
 21:57:52     2021-10-29T01:56:50.574797297Z stderr INFO Running inside container 9tee4-dz642-j9qil6yio4ms1ku 
 21:57:52     2021-10-29T01:56:50.596940754Z stderr INFO [workflow workflow.json#main] start 
 21:57:52     2021-10-29T01:56:50.597164015Z stderr INFO [workflow workflow.json#main] starting step step1 
 21:57:52     2021-10-29T01:56:50.597407019Z stderr INFO [step step1] start 
 21:57:52     2021-10-29T01:56:50.759170131Z stderr INFO [container step1] 9tee4-xvhdp-njas6p46cvav01s state is Committed 
 21:57:52     2021-10-29T01:57:38.633382081Z stderr INFO [container step1] 9tee4-xvhdp-njas6p46cvav01s is Final 
 21:57:52     2021-10-29T01:57:38.720112167Z stderr ERROR [container step1] (9tee4-xvhdp-njas6p46cvav01s) error log: 
 21:57:52     2021-10-29T01:57:38.720112167Z stderr 
 21:57:52     2021-10-29T01:57:38.720112167Z stderr     2021-10-29T01:57:20.484569438Z crunch-run Not starting a gateway server (GatewayAuthSecret was not provided by dispatcher) 
 21:57:52     2021-10-29T01:57:38.720112167Z stderr     2021-10-29T01:57:20.484755292Z crunch-run crunch-run 2.3.0 (go1.17.1) started 
 21:57:52     2021-10-29T01:57:38.720112167Z stderr     2021-10-29T01:57:20.484776349Z crunch-run Executing container '9tee4-dz642-szbmvhrdakmb0ha' using singularity runtime 
 21:57:52     2021-10-29T01:57:38.720112167Z stderr     2021-10-29T01:57:20.484839635Z crunch-run Executing on host 'compute1.9tee4.arvadosapi.com' 
 21:57:52     2021-10-29T01:57:38.720112167Z stderr     2021-10-29T01:57:20.604062194Z crunch-run container token "v2/9tee4-gj3su-eaaf5ru6ohnw7d2/36sq5h3y5f0qo1s7qlgzrn9hfn4h851ufstfkpjku5sh76i92j/9tee4-dz642-szbmvhrdakmb0ha" 
 21:57:52     2021-10-29T01:57:38.720112167Z stderr     2021-10-29T01:57:20.604359220Z crunch-run Running [arv-mount --foreground --read-write --storage-classes default --crunchstat-interval=10 --file-cache 268435456 --mount-by-pdh by_id --mount-by-id by_uuid /tmp/crunch-run.9tee4-dz642-szbmvhrdakmb0ha.3491659512/keep4170174168] 
 21:57:52     2021-10-29T01:57:38.720112167Z stderr     2021-10-29T01:57:21.510019953Z crunch-run Fetching Docker image from collection 'd2a6f06e1f8e3e7d72b8ee89622e9f96+261' 
 21:57:52     2021-10-29T01:57:38.720112167Z stderr     2021-10-29T01:57:21.576950500Z crunch-run Using Docker image id "sha256:61064933b465210ec06517e95e64b0909841d4ccf037552266d7079baed45a6e" 
 21:57:52     2021-10-29T01:57:38.720112167Z stderr     2021-10-29T01:57:21.576984357Z crunch-run Loading Docker image from keep 
 21:57:52     2021-10-29T01:57:38.720112167Z stderr     2021-10-29T01:57:22.013919659Z crunch-run Starting container 
 21:57:52     2021-10-29T01:57:38.720112167Z stderr     2021-10-29T01:57:22.014675984Z crunch-run Waiting for container to finish 
 21:57:52     2021-10-29T01:57:38.720112167Z stderr     2021-10-29T01:57:27.015064319Z crunch-run maximum run time exceeded. Stopping container. 
 21:57:52     2021-10-29T01:57:38.720112167Z stderr     2021-10-29T01:57:27.015148637Z crunch-run stopping container 
 21:57:52     2021-10-29T01:57:38.720112167Z stderr     2021-10-29T01:57:27.123407016Z crunch-run Cancelled 
 21:57:52     2021-10-29T01:57:38.817582951Z stderr ERROR [step step1] Output is missing expected field file:///var/lib/cwl/workflow.json#main/step1/o 
 21:57:52     2021-10-29T01:57:38.900999492Z stderr WARNING [step step1] completed permanentFail 
 21:57:52     2021-10-29T01:57:38.976870679Z stderr INFO [workflow workflow.json#main] completed permanentFail 
 21:57:52     2021-10-29T01:57:38.976917303Z stderr ERROR Overall process status is permanentFail 
 21:57:52     2021-10-29T01:57:39.142930716Z stderr INFO Final output collection 42866e7c2f47b2fd38dca68903e59c4b+59 "Output of main (2021-10-29T01:57:39.107Z)" (9tee4-4zz18-yr16og3bk6qwk3q) 
 21:57:52     2021-10-29T01:57:39.374321052Z stderr WARNING Final process status is permanentFail 
 21:57:52     2021-10-29T01:57:39.956477426Z crunch-run Complete 
 21:57:52 ERROR Overall process status is permanentFail 
 21:57:52 WARNING Final process status is permanentFail 
 </pre> 

 And on 9tee4 (from https://ci.arvados.org/view/Arvados%20build%20pipeline/job/run-tests-cwl-suite/40/console): 

 <pre> 
 11:24:02 Test 221 failed: /usr/bin/arvados-cwl-runner --compute-checksum --disable-reuse --eval-timeout 60 --outdir=/tmp/tmp_lnu_8e_ --quiet tests/timelimit2-wf.cwl tests/empty.json 
 11:24:02 Test that workflow level time limit is not applied to workflow execution time 
 11:24:02 Returned non-zero 
 11:24:02 ERROR [container timelimit2-wf.cwl] (9tee4-dz642-tggdcuakup1jfq5) error log: 
 11:24:02  
 11:24:02     2021-10-29T15:22:53.175747990Z crunch-run Loading Docker image from keep 
 11:24:02     2021-10-29T15:22:53.849775581Z crunch-run Starting container 
 11:24:02     2021-10-29T15:22:53.850902104Z crunch-run Waiting for container to finish 
 11:24:02     2021-10-29T15:23:05.619135487Z stderr INFO /usr/bin/arvados-cwl-runner 2.3.0.dev20211020182823, arvados-python-client 2.3.0.dev20211013202728, cwltool 3.1.20211020155521 
 11:24:02     2021-10-29T15:23:05.636148789Z stderr INFO Resolved '/var/lib/cwl/workflow.json#main' to 'file:///var/lib/cwl/workflow.json#main' 
 11:24:02     2021-10-29T15:23:14.022887185Z stderr INFO Using cluster 9tee4 (https://workbench2.9tee4.arvadosapi.com/) 
 11:24:02     2021-10-29T15:23:14.279142453Z stderr INFO Upload local files: "workflow.json" 
 11:24:02     2021-10-29T15:23:14.428674656Z stderr INFO Uploaded to f9d6a1cda3285e35b4b70234c4712b95+61 (9tee4-4zz18-9zab9j0xsgvigar) 
 11:24:02     2021-10-29T15:23:14.435330631Z stderr INFO Upload local files: "workflow.json" 
 11:24:02     2021-10-29T15:23:14.571627543Z stderr INFO Uploaded to f9d6a1cda3285e35b4b70234c4712b95+61 (9tee4-4zz18-gjzewquov369heg) 
 11:24:02     2021-10-29T15:23:19.004648013Z stderr INFO Using collection cache size 256 MiB 
 11:24:02     2021-10-29T15:23:19.030485917Z stderr INFO Running inside container 9tee4-dz642-tggdcuakup1jfq5 
 11:24:02     2021-10-29T15:23:19.052813153Z stderr INFO [workflow workflow.json#main] start 
 11:24:02     2021-10-29T15:23:19.053081383Z stderr INFO [workflow workflow.json#main] starting step step1 
 11:24:02     2021-10-29T15:23:19.053338000Z stderr INFO [step step1] start 
 11:24:02     2021-10-29T15:23:19.230789355Z stderr INFO [container step1] 9tee4-xvhdp-qojf0j4ca8lyvnf state is Committed 
 11:24:02     2021-10-29T15:23:55.085180501Z stderr INFO [container step1] 9tee4-xvhdp-qojf0j4ca8lyvnf is Final 
 11:24:02     2021-10-29T15:23:55.176969339Z stderr ERROR [container step1] (9tee4-xvhdp-qojf0j4ca8lyvnf) error log: 
 11:24:02     2021-10-29T15:23:55.176969339Z stderr 
 11:24:02     2021-10-29T15:23:55.176969339Z stderr     2021-10-29T15:23:45.345522001Z crunch-run Not starting a gateway server (GatewayAuthSecret was not provided by dispatcher) 
 11:24:02     2021-10-29T15:23:55.176969339Z stderr     2021-10-29T15:23:45.345759158Z crunch-run crunch-run 2.3.0 (go1.17.1) started 
 11:24:02     2021-10-29T15:23:55.176969339Z stderr     2021-10-29T15:23:45.345779799Z crunch-run Executing container '9tee4-dz642-e1vpz4lmmpzjdr4' using singularity runtime 
 11:24:02     2021-10-29T15:23:55.176969339Z stderr     2021-10-29T15:23:45.345801566Z crunch-run Executing on host 'compute1.9tee4.arvadosapi.com' 
 11:24:02     2021-10-29T15:23:55.176969339Z stderr     2021-10-29T15:23:45.459713326Z crunch-run container token "v2/9tee4-gj3su-ewn10p08oeckpxj/1eqyw1uioulmxl78olk0i743ipt4nro0neksh32ee48ks8rjux/9tee4-dz642-e1vpz4lmmpzjdr4" 
 11:24:02     2021-10-29T15:23:55.176969339Z stderr     2021-10-29T15:23:45.460034932Z crunch-run Running [arv-mount --foreground --read-write --storage-classes default --crunchstat-interval=10 --file-cache 268435456 --mount-by-pdh by_id --mount-by-id by_uuid /tmp/crunch-run.9tee4-dz642-e1vpz4lmmpzjdr4.734327895/keep1570108756] 
 11:24:02     2021-10-29T15:23:55.176969339Z stderr     2021-10-29T15:23:46.365999816Z crunch-run Fetching Docker image from collection 'd2a6f06e1f8e3e7d72b8ee89622e9f96+261' 
 11:24:02     2021-10-29T15:23:55.176969339Z stderr     2021-10-29T15:23:46.430319132Z crunch-run Using Docker image id "sha256:61064933b465210ec06517e95e64b0909841d4ccf037552266d7079baed45a6e" 
 11:24:02     2021-10-29T15:23:55.176969339Z stderr     2021-10-29T15:23:46.430387977Z crunch-run Loading Docker image from keep 
 11:24:02     2021-10-29T15:23:55.176969339Z stderr     2021-10-29T15:23:47.011343535Z crunch-run Starting container 
 11:24:02     2021-10-29T15:23:55.176969339Z stderr     2021-10-29T15:23:47.012153992Z crunch-run Waiting for container to finish 
 11:24:02     2021-10-29T15:23:55.176969339Z stderr     2021-10-29T15:23:52.012532501Z crunch-run maximum run time exceeded. Stopping container. 
 11:24:02     2021-10-29T15:23:55.176969339Z stderr     2021-10-29T15:23:52.012641954Z crunch-run stopping container 
 11:24:02     2021-10-29T15:23:55.176969339Z stderr     2021-10-29T15:23:52.123251411Z crunch-run Cancelled 
 11:24:02     2021-10-29T15:23:55.259792247Z stderr ERROR [step step1] Output is missing expected field file:///var/lib/cwl/workflow.json#main/step1/o 
 11:24:02     2021-10-29T15:23:55.344468592Z stderr WARNING [step step1] completed permanentFail 
 11:24:02     2021-10-29T15:23:55.423478553Z stderr INFO [workflow workflow.json#main] completed permanentFail 
 11:24:02     2021-10-29T15:23:55.423519944Z stderr ERROR Overall process status is permanentFail 
 11:24:02     2021-10-29T15:23:55.572398801Z stderr INFO Final output collection 42866e7c2f47b2fd38dca68903e59c4b+59 "Output of main (2021-10-29T15:23:55.541Z)" (9tee4-4zz18-smx5y5kzo0oyh11) 
 11:24:02     2021-10-29T15:23:55.804266214Z stderr WARNING Final process status is permanentFail 
 11:24:02     2021-10-29T15:23:56.277148741Z crunch-run Complete 
 11:24:02 ERROR Overall process status is permanentFail 
 11:24:02 WARNING Final process status is permanentFail 
 </pre> 

 Compare with ce8i5 where it passes (https://ci.arvados.org/view/Arvados%20build%20pipeline/job/run-tests-cwl-suite/38).

Back