Project

General

Profile

Actions

Bug #7971

closed

Python SDK Keep timeouts on su92l are too agressive

Added by Sarah Guthrie over 8 years ago. Updated over 4 years ago.

Status:
Closed
Priority:
Normal
Assigned To:
-
Category:
-
Target version:
-
Story points:
-

Description

https://workbench.su92l.arvadosapi.com/pipeline_instances/su92l-d1hrv-gk11bugx763h0tl#Log

I tried downloading the files the jobs are expected to read - they downloaded fine.

I cancelled the job after the first 10 or so failures to avoid wasting compute time.


Related issues

Related to Arvados - Idea #8539: [SDKs/FUSE] Better retry defaultsResolvedActions
Actions #1

Updated by Nico César over 8 years ago

the actual issue is 'Operation timed out after 300000 milliseconds with 0 bytes received' on keep15... investigating

Actions #2

Updated by Nico César over 8 years ago

so the data is there and says that it took 1257.834175s to transfer it

keep15.su92l:/home/nico# grep 64be48c8afb7800c007856fb2ea1a6fb /etc/sv/keepstore/log/main/current
2015-12-08_18:59:19.92647 2015/12/08 18:59:19 [10.28.64.30:58798] GET 64be48c8afb7800c007856fb2ea1a6fb+8645645+A392c53cccd46d057477d854575ebb42ae6423815@5679989c 1257.834175s 200 8645645 "OK" 
keep15.su92l:/home/nico# ls /data/su92l-keep-*/keep/64b/64be48c8afb7800c007856fb2ea1a6fb
/data/su92l-keep-4/keep/64b/64be48c8afb7800c007856fb2ea1a6fb
keep15.su92l:/home/nico# md5sum /data/su92l-keep-4/keep/64b/64be48c8afb7800c007856fb2ea1a6fb
64be48c8afb7800c007856fb2ea1a6fb  /data/su92l-keep-4/keep/64b/64be48c8afb7800c007856fb2ea1a6fb

Actions #3

Updated by Nico César over 8 years ago

everything seems ok now

keep15.su92l:/home/nico# time md5sum /data/su92l-keep-4/keep/db7/db7850a4a0c42aaa354f41bcab05f7a8  /data/su92l-keep-4/keep/0e4/0e4c80cb8017e52812aed9dbad71a6d1  /data/su92l-keep-4/keep/66e/66ead77af94bacbd5b96365412d601dd
db7850a4a0c42aaa354f41bcab05f7a8  /data/su92l-keep-4/keep/db7/db7850a4a0c42aaa354f41bcab05f7a8
0e4c80cb8017e52812aed9dbad71a6d1  /data/su92l-keep-4/keep/0e4/0e4c80cb8017e52812aed9dbad71a6d1
66ead77af94bacbd5b96365412d601dd  /data/su92l-keep-4/keep/66e/66ead77af94bacbd5b96365412d601dd

real    0m0.220s
user    0m0.176s
sys     0m0.040s

should we make some kind of analysis on the logs for requests > 300s ??

Actions #4

Updated by Ward Vandewege over 8 years ago

  • Subject changed from Python SDK on su92l raises KeepReadError: failed to read [...] service [...] responded with 404 HTTP/1.1 404 Not Found to Python SDK Keep timeouts on su92l are too agressive
Actions #5

Updated by Peter Amstutz over 4 years ago

  • Status changed from New to Closed
Actions

Also available in: Atom PDF