Project

General

Profile

Actions

Bug #14705

open

Weird container rerun on fail?

Added by Bryan Cosca over 5 years ago. Updated about 2 months ago.

Status:
New
Priority:
Normal
Assigned To:
-
Category:
-
Target version:
Story points:
-
Release:
Release relationship:
Auto

Description

top level cr: https://workbench.e51c5.arvadosapi.com/container_requests/e51c5-xvhdp-qw8upy1814gij1q#Status

child cr: https://workbench.e51c5.arvadosapi.com/container_requests/e51c5-xvhdp-zvpx6tnr9w44ikc

This container started at 2:50 PM 1/7/2019, but it says it started at 4:32 AM 1/7/2019, but I definitely saw it running at 3/4pm on 1/7. There's this line: "It has runtime of 4h39m(13h42mqueued) and used 4h39m of node allocation time (1.0тип scaling)" but I know it wasn't queued for 13h.

My theory is that there was some failure or restart that caused it to restart at 4:32AM, but I don't see this in the logs.

This job also should finish in 20 mins so I'm confused what it was doing for that long (there's not much in the logs). CPU usage was averaging around 730% looking at the html crunchstat summary. memory usage was low.


Files

Actions

Also available in: Atom PDF