Project

General

Profile

Bug #16132

Updated by Peter Amstutz about 4 years ago

The run-cwl-test-ce8i5 job runs the CWL test suite against the test cluster ce8i5. 

 Tests have been failing with timeouts.    There is a 900s (15m) limit on tests, so that means it is taking longer than that to either bring up a node or schedule a container. 

 It is unclear if the issue is Azure delay, arvados-dispatch-cloud delay, some interaction between the two (waiting for something it shouldn't), stalled ssh connections, or what. 

 One way to begin diagnosing this would be to start collecting prometheus metrics, and see if we can capture the delay on a graph. 

Back