Bug #18026

Updated by Ward Vandewege 3 months ago


This is from https://dev.arvados.org/issues/17755#note-26:
=====================================================================================================

The first two failures I think are due to a race condition between two crunch-run processes trying to convert and caching the singularity image.

This attempt found the collection, but apparently before the cached image had been created. (not supposed to happen.)

> 2021-08-06T21:26:16.608212665Z Using Docker image id "sha256:337550d506a3fc77e30292bba95108f1cd34a33719f0dd997d0de0540522def7"
> 2021-08-06T21:26:16.608240987Z Loading Docker image from keep
> 2021-08-06T21:26:17.188261275Z building singularity image
> 2021-08-06T21:26:17.223285265Z [singularity build /tmp/crunch-run.tordo-dz642-pez87oegh5fgbs7.099258167/keep207354666/by_uuid/tordo-4zz18-t0wx4utpwx4ligf/image.sif docker-archive:///tmp/crunch-run-singularity-917581112/image.tar]
> 2021-08-06T21:26:26.466105517Z INFO: Starting build...
> 2021-08-06T21:26:26.466105517Z FATAL: While performing build: conveyor failed to get: Error loading tar component 337550d506a3fc77e30292bba95108f1cd34a33719f0dd997d0de0540522def7.json: open /tmp/crunch-run-singularity-917581112/image.tar: no such file or directory
> 2021-08-06T21:26:26.466234171Z error in Run: While loading container image: exit status 255
> 2021-08-06T21:26:26.466268708Z error in CaptureOutput: error scanning files to copy to output: lstat "/var/spool/cwl": lstat /tmp/crunch-run.tordo-dz642-pez87oegh5fgbs7.099258167/tmp701045228: no such file or directory
> 2021-08-06T21:26:26.605452965Z Cancelled

On the second attempt, it tried to create a collection with the same temporary name (down to the exact timestamp?) and that failed.

> 2021-08-06T21:26:47.149336231Z Executing container 'tordo-dz642-amjt50vnz4qyn4n'
> ...
> 2021-08-06T21:26:47.972965997Z error in Run: While loading container image: error creating 'singularity image for sha256:337550d506a3fc77e30292bba95108f1cd34a33719f0dd997d0de0540522def7 2021-08-06T21:26:47Z' collection: request failed: https://tordo.arvadosapi.com/arvados/v1/collections: 422 Unprocessable Entity: //railsapi.internal/arvados/v1/collections: 422 Unprocessable Entity: #<ActiveRecord::RecordNotUnique: PG::UniqueViolation: ERROR: duplicate key value violates unique constraint "index_collections_on_owner_uuid_and_name"
> 2021-08-06T21:26:47.972965997Z DETAIL: Key (owner_uuid, name)=(tordo-j7d0g-7p82g804nk5l7gx, singularity image for sha256:337550d506a3fc77e30292bba95108f1cd34a33719f0dd997d0de0540522def7 2021-08-06T21:26:47Z) already exists.
> 2021-08-06T21:26:47.972965997Z : INSERT INTO "collections" ("owner_uuid", "created_at", "modified_by_user_uuid", "modified_at", "portable_data_hash", "updated_at", "uuid", "manifest_text", "name", "properties", "delete_at", "file_names", "trash_at", "current_version_uuid") VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14) RETURNING "id"> (req-8t57dqc95orqsvelydce)
> 2021-08-06T21:26:48.136224600Z Cancelled

Back