Project

General

Profile

Actions

Bug #9438

closed

keep-balance hung and needed ctrl-c

Added by Joshua Randall almost 8 years ago. Updated almost 7 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
Keep
Target version:
-
Story points:
1.0

Description

I started keep-balance on friday evening without the `-once` flag and left it running. On Sunday morning I found it hung, with the last message it had printed from the night before, which seems to have occurred just after a "run failed" error:

2016/06/18 22:02:38 run failed: z8ta6-bi6l4-w3rpndae62qwwre (humgen-02-02.internal.sanger.ac.uk:25107, disk): send pull list: Put http://humgen-02-02.internal.
sanger.ac.uk:25107/pull: EOF

keep-balance was started on Friday as:

root@humgen-01-01:~# keep-balance -config /root/keep-balance.json -commit-pulls -commit-trash
2016/06/17 21:03:59 starting up: will scan every 10m0s and on SIGUSR1
2016/06/17 21:03:59 Run: start
...

Last run where it got stuck:

2016/06/18 22:01:06 GetCurrentState: took 3h32m15.109080064s
2016/06/18 22:01:06 ComputeChangeSets: start
2016/06/18 22:02:33 ComputeChangeSets: took 1m27.424860059s
2016/06/18 22:02:37 ===
2016/06/18 22:02:37 12 replicas (6 blocks, 402731660 bytes) lost (0=have<want)
2016/06/18 22:02:37 1231 replicas (163 blocks, 80390127068 bytes) underreplicated (0<have<want)
2016/06/18 22:02:37 1141920 replicas (570080 blocks, 71769882700386 bytes) just right (have=want)
2016/06/18 22:02:37 4737089 replicas (4728845 blocks, 230577300632292 bytes) overreplicated (have>want>0)
2016/06/18 22:02:37 108568 replicas (36690 blocks, 2094872678 bytes) unreferenced (have>want=0, new)
2016/06/18 22:02:37 1087762 replicas (664904 blocks, 600078849218 bytes) garbage (have>want=0, old)
2016/06/18 22:02:37 ===
2016/06/18 22:02:37 10601328 replicas (5299094 blocks, 531960816569846 bytes) total commitment (excluding unreferenced)
2016/06/18 22:02:37 16533504 replicas (6000682 blocks, 763059498065306 bytes) total usage
2016/06/18 22:02:37 ===
2016/06/18 22:02:37 z8ta6-bi6l4-4b0e02ad7mk84ye (humgen-01-01.internal.sanger.ac.uk:25107, disk): ChangeSet{Pulls:1563, Trashes:20713}
2016/06/18 22:02:37 z8ta6-bi6l4-3kqkr5lgow2uogm (humgen-03-01.internal.sanger.ac.uk:25107, disk): ChangeSet{Pulls:4553, Trashes:20802}
2016/06/18 22:02:37 z8ta6-bi6l4-a1pntf0wx8vfr5v (humgen-03-03.internal.sanger.ac.uk:25107, disk): ChangeSet{Pulls:1495, Trashes:20899}
2016/06/18 22:02:37 z8ta6-bi6l4-stmnte9yvd2gh6o (humgen-04-01.internal.sanger.ac.uk:25107, disk): ChangeSet{Pulls:1641, Trashes:20722}
2016/06/18 22:02:37 z8ta6-bi6l4-az89xled1ycwnpb (humgen-04-03.internal.sanger.ac.uk:25107, disk): ChangeSet{Pulls:1476, Trashes:20927}
2016/06/18 22:02:37 z8ta6-bi6l4-kijrzcy3zkflg3s (humgen-01-02.internal.sanger.ac.uk:25107, disk): ChangeSet{Pulls:36345, Trashes:82968}
2016/06/18 22:02:37 z8ta6-bi6l4-nynctbmdi8nj6v0 (humgen-01-03.internal.sanger.ac.uk:25107, disk): ChangeSet{Pulls:14318, Trashes:17942}
2016/06/18 22:02:37 z8ta6-bi6l4-yxhkoekmnv5czf3 (humgen-02-01.internal.sanger.ac.uk:25107, disk): ChangeSet{Pulls:2940, Trashes:20789}
2016/06/18 22:02:37 z8ta6-bi6l4-w3rpndae62qwwre (humgen-02-02.internal.sanger.ac.uk:25107, disk): ChangeSet{Pulls:1522, Trashes:20516}
2016/06/18 22:02:37 z8ta6-bi6l4-sg7xxak114gh1j0 (humgen-02-03.internal.sanger.ac.uk:25107, disk): ChangeSet{Pulls:1511, Trashes:20965}
2016/06/18 22:02:37 z8ta6-bi6l4-ph34sug9wmnom07 (humgen-03-02.internal.sanger.ac.uk:25107, disk): ChangeSet{Pulls:1712, Trashes:482300}
2016/06/18 22:02:37 z8ta6-bi6l4-lhps1yuzszk0315 (humgen-04-02.internal.sanger.ac.uk:25107, disk): ChangeSet{Pulls:1548, Trashes:20863}
2016/06/18 22:02:37 ===
2016/06/18 22:02:37 Replication level distribution (counting N replicas on a single server as N):
2016/06/18 22:02:37  0:       6 #######
2016/06/18 22:02:37  1:  390568 ##################################################
2016/06/18 22:02:37  2:  697270 ####################################################
2016/06/18 22:02:37  3: 4905403 ###########################################################
2016/06/18 22:02:37  4:    6282 ##################################
2016/06/18 22:02:37  5:     952 ##########################
2016/06/18 22:02:37  6:      30 #############
2016/06/18 22:02:37  7:       1 ##
2016/06/18 22:02:37  8:       0
2016/06/18 22:02:37  9:       0
2016/06/18 22:02:37 10:       0
2016/06/18 22:02:37 11:       0
2016/06/18 22:02:37 12:     176 ####################
2016/06/18 22:02:37 ===
2016/06/18 22:02:37 z8ta6-bi6l4-kijrzcy3zkflg3s (humgen-01-02.internal.sanger.ac.uk:25107, disk): send pull list: start
2016/06/18 22:02:37 z8ta6-bi6l4-a1pntf0wx8vfr5v (humgen-03-03.internal.sanger.ac.uk:25107, disk): send pull list: start
2016/06/18 22:02:37 z8ta6-bi6l4-w3rpndae62qwwre (humgen-02-02.internal.sanger.ac.uk:25107, disk): send pull list: start
2016/06/18 22:02:37 z8ta6-bi6l4-yxhkoekmnv5czf3 (humgen-02-01.internal.sanger.ac.uk:25107, disk): send pull list: start
2016/06/18 22:02:37 z8ta6-bi6l4-ph34sug9wmnom07 (humgen-03-02.internal.sanger.ac.uk:25107, disk): send pull list: start
2016/06/18 22:02:37 z8ta6-bi6l4-az89xled1ycwnpb (humgen-04-03.internal.sanger.ac.uk:25107, disk): send pull list: start
2016/06/18 22:02:37 z8ta6-bi6l4-sg7xxak114gh1j0 (humgen-02-03.internal.sanger.ac.uk:25107, disk): send pull list: start
2016/06/18 22:02:37 z8ta6-bi6l4-lhps1yuzszk0315 (humgen-04-02.internal.sanger.ac.uk:25107, disk): send pull list: start
2016/06/18 22:02:37 z8ta6-bi6l4-nynctbmdi8nj6v0 (humgen-01-03.internal.sanger.ac.uk:25107, disk): send pull list: start
2016/06/18 22:02:37 z8ta6-bi6l4-4b0e02ad7mk84ye (humgen-01-01.internal.sanger.ac.uk:25107, disk): send pull list: start
2016/06/18 22:02:37 z8ta6-bi6l4-3kqkr5lgow2uogm (humgen-03-01.internal.sanger.ac.uk:25107, disk): send pull list: start
2016/06/18 22:02:37 z8ta6-bi6l4-stmnte9yvd2gh6o (humgen-04-01.internal.sanger.ac.uk:25107, disk): send pull list: start
2016/06/18 22:02:37 z8ta6-bi6l4-a1pntf0wx8vfr5v (humgen-03-03.internal.sanger.ac.uk:25107, disk): send pull list: took 19.416031ms
2016/06/18 22:02:37 z8ta6-bi6l4-lhps1yuzszk0315 (humgen-04-02.internal.sanger.ac.uk:25107, disk): send pull list: took 19.662912ms
2016/06/18 22:02:37 z8ta6-bi6l4-4b0e02ad7mk84ye (humgen-01-01.internal.sanger.ac.uk:25107, disk): send pull list: took 20.120161ms
2016/06/18 22:02:37 z8ta6-bi6l4-ph34sug9wmnom07 (humgen-03-02.internal.sanger.ac.uk:25107, disk): send pull list: took 21.278342ms
2016/06/18 22:02:37 z8ta6-bi6l4-az89xled1ycwnpb (humgen-04-03.internal.sanger.ac.uk:25107, disk): send pull list: took 21.672833ms
2016/06/18 22:02:37 z8ta6-bi6l4-sg7xxak114gh1j0 (humgen-02-03.internal.sanger.ac.uk:25107, disk): send pull list: took 22.979431ms
2016/06/18 22:02:37 z8ta6-bi6l4-w3rpndae62qwwre (humgen-02-02.internal.sanger.ac.uk:25107, disk): send pull list: took 23.552313ms
2016/06/18 22:02:37 z8ta6-bi6l4-w3rpndae62qwwre (humgen-02-02.internal.sanger.ac.uk:25107, disk): send pull list: Put http://humgen-02-02.internal.sanger.ac.uk
:25107/pull: EOF
2016/06/18 22:02:37 z8ta6-bi6l4-stmnte9yvd2gh6o (humgen-04-01.internal.sanger.ac.uk:25107, disk): send pull list: took 26.28401ms
2016/06/18 22:02:38 z8ta6-bi6l4-yxhkoekmnv5czf3 (humgen-02-01.internal.sanger.ac.uk:25107, disk): send pull list: took 39.027898ms
2016/06/18 22:02:38 z8ta6-bi6l4-3kqkr5lgow2uogm (humgen-03-01.internal.sanger.ac.uk:25107, disk): send pull list: took 57.850175ms
2016/06/18 22:02:38 z8ta6-bi6l4-nynctbmdi8nj6v0 (humgen-01-03.internal.sanger.ac.uk:25107, disk): send pull list: took 148.006551ms
2016/06/18 22:02:38 z8ta6-bi6l4-kijrzcy3zkflg3s (humgen-01-02.internal.sanger.ac.uk:25107, disk): send pull list: took 380.793057ms
2016/06/18 22:02:38 Run: took 3h33m47.501669257s
2016/06/18 22:02:38 run failed: z8ta6-bi6l4-w3rpndae62qwwre (humgen-02-02.internal.sanger.ac.uk:25107, disk): send pull list: Put http://humgen-02-02.internal.
sanger.ac.uk:25107/pull: EOF
2016/06/18 22:02:38 timer went off
2016/06/18 22:02:38 starting next run
2016/06/18 22:02:38 Run: start
2016/06/18 22:02:38 z8ta6-bi6l4-a1pntf0wx8vfr5v (humgen-03-03.internal.sanger.ac.uk:25107, disk): send trash list: start
2016/06/18 22:02:38 z8ta6-bi6l4-w3rpndae62qwwre (humgen-02-02.internal.sanger.ac.uk:25107, disk): send trash list: start
2016/06/18 22:02:38 z8ta6-bi6l4-az89xled1ycwnpb (humgen-04-03.internal.sanger.ac.uk:25107, disk): send trash list: start
2016/06/18 22:02:38 z8ta6-bi6l4-nynctbmdi8nj6v0 (humgen-01-03.internal.sanger.ac.uk:25107, disk): send trash list: start
2016/06/18 22:02:38 z8ta6-bi6l4-yxhkoekmnv5czf3 (humgen-02-01.internal.sanger.ac.uk:25107, disk): send trash list: start
2016/06/18 22:02:38 z8ta6-bi6l4-lhps1yuzszk0315 (humgen-04-02.internal.sanger.ac.uk:25107, disk): send trash list: start
2016/06/18 22:02:38 z8ta6-bi6l4-stmnte9yvd2gh6o (humgen-04-01.internal.sanger.ac.uk:25107, disk): send trash list: start
2016/06/18 22:02:38 z8ta6-bi6l4-kijrzcy3zkflg3s (humgen-01-02.internal.sanger.ac.uk:25107, disk): send trash list: start
2016/06/18 22:02:38 z8ta6-bi6l4-3kqkr5lgow2uogm (humgen-03-01.internal.sanger.ac.uk:25107, disk): send trash list: start
2016/06/18 22:02:38 z8ta6-bi6l4-sg7xxak114gh1j0 (humgen-02-03.internal.sanger.ac.uk:25107, disk): send trash list: start
2016/06/18 22:02:38 z8ta6-bi6l4-4b0e02ad7mk84ye (humgen-01-01.internal.sanger.ac.uk:25107, disk): send trash list: start
2016/06/18 22:02:38 z8ta6-bi6l4-ph34sug9wmnom07 (humgen-03-02.internal.sanger.ac.uk:25107, disk): send trash list: start
2016/06/18 22:02:38 z8ta6-bi6l4-az89xled1ycwnpb (humgen-04-03.internal.sanger.ac.uk:25107, disk): send trash list: took 1.17461ms
2016/06/18 22:02:38 z8ta6-bi6l4-a1pntf0wx8vfr5v (humgen-03-03.internal.sanger.ac.uk:25107, disk): send trash list: took 1.320745ms
2016/06/18 22:02:38 z8ta6-bi6l4-nynctbmdi8nj6v0 (humgen-01-03.internal.sanger.ac.uk:25107, disk): send trash list: took 1.424586ms
2016/06/18 22:02:38 z8ta6-bi6l4-ph34sug9wmnom07 (humgen-03-02.internal.sanger.ac.uk:25107, disk): send trash list: took 1.386748ms
2016/06/18 22:02:38 z8ta6-bi6l4-stmnte9yvd2gh6o (humgen-04-01.internal.sanger.ac.uk:25107, disk): send trash list: took 1.766106ms
2016/06/18 22:02:38 z8ta6-bi6l4-w3rpndae62qwwre (humgen-02-02.internal.sanger.ac.uk:25107, disk): send trash list: took 1.891532ms
2016/06/18 22:02:38 z8ta6-bi6l4-yxhkoekmnv5czf3 (humgen-02-01.internal.sanger.ac.uk:25107, disk): send trash list: took 1.848563ms
2016/06/18 22:02:38 z8ta6-bi6l4-4b0e02ad7mk84ye (humgen-01-01.internal.sanger.ac.uk:25107, disk): send trash list: took 1.866451ms
2016/06/18 22:02:38 z8ta6-bi6l4-sg7xxak114gh1j0 (humgen-02-03.internal.sanger.ac.uk:25107, disk): send trash list: took 1.909948ms
2016/06/18 22:02:38 z8ta6-bi6l4-lhps1yuzszk0315 (humgen-04-02.internal.sanger.ac.uk:25107, disk): send trash list: took 1.986878ms
2016/06/18 22:02:38 z8ta6-bi6l4-3kqkr5lgow2uogm (humgen-03-01.internal.sanger.ac.uk:25107, disk): send trash list: took 1.983197ms
^C
root@humgen-01-01:~# keep-balance -config /root/keep-balance.json -commit-pulls -commit-trash
2016/06/19 12:16:01 starting up: will scan every 10m0s and on SIGUSR1
2016/06/19 12:16:01 Run: start


Subtasks 1 (0 open1 closed)

Task #9562: Review 9438-http-default-timeoutResolvedLucas Di Pentima07/06/2016Actions
Actions #1

Updated by Joshua Randall almost 8 years ago

Found keep-balance hung again today - it had not produced any output for the past 43 hours.

It looks like the common thread between these two failures is that keep-balance hung in the middle of a run that followed a run in which the log output said "run failed" because of "send pull list ... EOF"

Beginning and end of output is:

root@humgen-01-01:~# keep-balance -config /root/keep-balance.json -commit-pulls -commit-trash
2016/06/19 12:16:01 starting up: will scan every 10m0s and on SIGUSR1
2016/06/19 12:16:01 Run: start
...
2016/06/19 18:12:39 Run: took 2h59m51.275338725s
2016/06/19 18:12:39 run failed: z8ta6-bi6l4-4b0e02ad7mk84ye (humgen-01-01.internal.sanger.ac.uk:25107, disk): send pull list: Put http://humgen-01-01.internal.sanger.ac.uk:25107/pull: EOF
2016/06/19 18:12:39 timer went off
2016/06/19 18:12:39 starting next run
2016/06/19 18:12:39 Run: start
2016/06/19 18:12:39 z8ta6-bi6l4-stmnte9yvd2gh6o (humgen-04-01.internal.sanger.ac.uk:25107, disk): send trash list: start
2016/06/19 18:12:39 z8ta6-bi6l4-kijrzcy3zkflg3s (humgen-01-02.internal.sanger.ac.uk:25107, disk): send trash list: start
2016/06/19 18:12:39 z8ta6-bi6l4-w3rpndae62qwwre (humgen-02-02.internal.sanger.ac.uk:25107, disk): send trash list: start
2016/06/19 18:12:39 z8ta6-bi6l4-lhps1yuzszk0315 (humgen-04-02.internal.sanger.ac.uk:25107, disk): send trash list: start
2016/06/19 18:12:39 z8ta6-bi6l4-sg7xxak114gh1j0 (humgen-02-03.internal.sanger.ac.uk:25107, disk): send trash list: start
2016/06/19 18:12:39 z8ta6-bi6l4-az89xled1ycwnpb (humgen-04-03.internal.sanger.ac.uk:25107, disk): send trash list: start
2016/06/19 18:12:39 z8ta6-bi6l4-4b0e02ad7mk84ye (humgen-01-01.internal.sanger.ac.uk:25107, disk): send trash list: start
2016/06/19 18:12:39 z8ta6-bi6l4-nynctbmdi8nj6v0 (humgen-01-03.internal.sanger.ac.uk:25107, disk): send trash list: start
2016/06/19 18:12:39 z8ta6-bi6l4-a1pntf0wx8vfr5v (humgen-03-03.internal.sanger.ac.uk:25107, disk): send trash list: start
2016/06/19 18:12:39 z8ta6-bi6l4-yxhkoekmnv5czf3 (humgen-02-01.internal.sanger.ac.uk:25107, disk): send trash list: start
2016/06/19 18:12:39 z8ta6-bi6l4-3kqkr5lgow2uogm (humgen-03-01.internal.sanger.ac.uk:25107, disk): send trash list: start
2016/06/19 18:12:39 z8ta6-bi6l4-ph34sug9wmnom07 (humgen-03-02.internal.sanger.ac.uk:25107, disk): send trash list: start
2016/06/19 18:12:39 z8ta6-bi6l4-sg7xxak114gh1j0 (humgen-02-03.internal.sanger.ac.uk:25107, disk): send trash list: took 1.211381ms
2016/06/19 18:12:39 z8ta6-bi6l4-a1pntf0wx8vfr5v (humgen-03-03.internal.sanger.ac.uk:25107, disk): send trash list: took 1.302391ms
2016/06/19 18:12:39 z8ta6-bi6l4-lhps1yuzszk0315 (humgen-04-02.internal.sanger.ac.uk:25107, disk): send trash list: took 1.344222ms
2016/06/19 18:12:39 z8ta6-bi6l4-ph34sug9wmnom07 (humgen-03-02.internal.sanger.ac.uk:25107, disk): send trash list: took 1.338358ms
2016/06/19 18:12:39 z8ta6-bi6l4-stmnte9yvd2gh6o (humgen-04-01.internal.sanger.ac.uk:25107, disk): send trash list: took 1.511689ms
2016/06/19 18:12:39 z8ta6-bi6l4-kijrzcy3zkflg3s (humgen-01-02.internal.sanger.ac.uk:25107, disk): send trash list: took 1.519639ms
2016/06/19 18:12:39 z8ta6-bi6l4-4b0e02ad7mk84ye (humgen-01-01.internal.sanger.ac.uk:25107, disk): send trash list: took 1.563933ms
2016/06/19 18:12:39 z8ta6-bi6l4-w3rpndae62qwwre (humgen-02-02.internal.sanger.ac.uk:25107, disk): send trash list: took 1.586532ms
2016/06/19 18:12:39 z8ta6-bi6l4-3kqkr5lgow2uogm (humgen-03-01.internal.sanger.ac.uk:25107, disk): send trash list: took 1.47872ms
2016/06/19 18:12:39 z8ta6-bi6l4-az89xled1ycwnpb (humgen-04-03.internal.sanger.ac.uk:25107, disk): send trash list: took 1.577564ms
2016/06/19 18:12:39 z8ta6-bi6l4-nynctbmdi8nj6v0 (humgen-01-03.internal.sanger.ac.uk:25107, disk): send trash list: took 1.577052ms
2016/06/19 18:12:39 z8ta6-bi6l4-yxhkoekmnv5czf3 (humgen-02-01.internal.sanger.ac.uk:25107, disk): send trash list: took 1.728638ms
2016/06/19 18:12:39 GetCurrentState: start
2016/06/19 18:12:39 z8ta6-bi6l4-ph34sug9wmnom07 (humgen-03-02.internal.sanger.ac.uk:25107, disk): retrieve index
2016/06/19 18:12:39 z8ta6-bi6l4-az89xled1ycwnpb (humgen-04-03.internal.sanger.ac.uk:25107, disk): retrieve index
2016/06/19 18:12:39 z8ta6-bi6l4-nynctbmdi8nj6v0 (humgen-01-03.internal.sanger.ac.uk:25107, disk): retrieve index
2016/06/19 18:12:39 z8ta6-bi6l4-w3rpndae62qwwre (humgen-02-02.internal.sanger.ac.uk:25107, disk): retrieve index
2016/06/19 18:12:39 z8ta6-bi6l4-kijrzcy3zkflg3s (humgen-01-02.internal.sanger.ac.uk:25107, disk): retrieve index
2016/06/19 18:12:39 z8ta6-bi6l4-stmnte9yvd2gh6o (humgen-04-01.internal.sanger.ac.uk:25107, disk): retrieve index
2016/06/19 18:12:39 z8ta6-bi6l4-4b0e02ad7mk84ye (humgen-01-01.internal.sanger.ac.uk:25107, disk): retrieve index
2016/06/19 18:12:39 z8ta6-bi6l4-sg7xxak114gh1j0 (humgen-02-03.internal.sanger.ac.uk:25107, disk): retrieve index
2016/06/19 18:12:39 z8ta6-bi6l4-yxhkoekmnv5czf3 (humgen-02-01.internal.sanger.ac.uk:25107, disk): retrieve index
2016/06/19 18:12:39 z8ta6-bi6l4-a1pntf0wx8vfr5v (humgen-03-03.internal.sanger.ac.uk:25107, disk): retrieve index
2016/06/19 18:12:39 z8ta6-bi6l4-3kqkr5lgow2uogm (humgen-03-01.internal.sanger.ac.uk:25107, disk): retrieve index
2016/06/19 18:12:39 z8ta6-bi6l4-lhps1yuzszk0315 (humgen-04-02.internal.sanger.ac.uk:25107, disk): retrieve index
2016/06/19 18:12:43 collections: 0/3273174
2016/06/19 18:12:53 z8ta6-bi6l4-nynctbmdi8nj6v0 (humgen-01-03.internal.sanger.ac.uk:25107, disk): add 1312482 replicas to map
2016/06/19 18:12:53 z8ta6-bi6l4-3kqkr5lgow2uogm (humgen-03-01.internal.sanger.ac.uk:25107, disk): add 1334745 replicas to map
2016/06/19 18:12:53 z8ta6-bi6l4-sg7xxak114gh1j0 (humgen-02-03.internal.sanger.ac.uk:25107, disk): add 1338775 replicas to map
2016/06/19 18:12:54 z8ta6-bi6l4-a1pntf0wx8vfr5v (humgen-03-03.internal.sanger.ac.uk:25107, disk): add 1339176 replicas to map
2016/06/19 18:12:54 z8ta6-bi6l4-stmnte9yvd2gh6o (humgen-04-01.internal.sanger.ac.uk:25107, disk): add 1337822 replicas to map
2016/06/19 18:12:54 z8ta6-bi6l4-yxhkoekmnv5czf3 (humgen-02-01.internal.sanger.ac.uk:25107, disk): add 1335581 replicas to map
2016/06/19 18:12:54 z8ta6-bi6l4-w3rpndae62qwwre (humgen-02-02.internal.sanger.ac.uk:25107, disk): add 1338270 replicas to map
2016/06/19 18:12:54 z8ta6-bi6l4-4b0e02ad7mk84ye (humgen-01-01.internal.sanger.ac.uk:25107, disk): add 1337686 replicas to map
2016/06/19 18:12:54 z8ta6-bi6l4-lhps1yuzszk0315 (humgen-04-02.internal.sanger.ac.uk:25107, disk): add 1340508 replicas to map
2016/06/19 18:12:54 z8ta6-bi6l4-kijrzcy3zkflg3s (humgen-01-02.internal.sanger.ac.uk:25107, disk): add 1346446 replicas to map
2016/06/19 18:12:54 z8ta6-bi6l4-az89xled1ycwnpb (humgen-04-03.internal.sanger.ac.uk:25107, disk): add 1373440 replicas to map
2016/06/19 18:12:54 z8ta6-bi6l4-nynctbmdi8nj6v0 (humgen-01-03.internal.sanger.ac.uk:25107, disk): done
2016/06/19 18:12:56 z8ta6-bi6l4-3kqkr5lgow2uogm (humgen-03-01.internal.sanger.ac.uk:25107, disk): done
2016/06/19 18:12:56 z8ta6-bi6l4-sg7xxak114gh1j0 (humgen-02-03.internal.sanger.ac.uk:25107, disk): done
2016/06/19 18:12:58 z8ta6-bi6l4-a1pntf0wx8vfr5v (humgen-03-03.internal.sanger.ac.uk:25107, disk): done
2016/06/19 18:12:59 z8ta6-bi6l4-stmnte9yvd2gh6o (humgen-04-01.internal.sanger.ac.uk:25107, disk): done
2016/06/19 18:13:00 z8ta6-bi6l4-yxhkoekmnv5czf3 (humgen-02-01.internal.sanger.ac.uk:25107, disk): done
2016/06/19 18:13:00 z8ta6-bi6l4-w3rpndae62qwwre (humgen-02-02.internal.sanger.ac.uk:25107, disk): done
2016/06/19 18:13:01 z8ta6-bi6l4-4b0e02ad7mk84ye (humgen-01-01.internal.sanger.ac.uk:25107, disk): done
2016/06/19 18:13:02 z8ta6-bi6l4-lhps1yuzszk0315 (humgen-04-02.internal.sanger.ac.uk:25107, disk): done
2016/06/19 18:13:02 z8ta6-bi6l4-kijrzcy3zkflg3s (humgen-01-02.internal.sanger.ac.uk:25107, disk): done
2016/06/19 18:13:06 z8ta6-bi6l4-az89xled1ycwnpb (humgen-04-03.internal.sanger.ac.uk:25107, disk): done
2016/06/19 18:13:52 collections: 25000/3273174
2016/06/19 18:14:59 z8ta6-bi6l4-ph34sug9wmnom07 (humgen-03-02.internal.sanger.ac.uk:25107, disk): add 1798573 replicas to map
2016/06/19 18:15:00 z8ta6-bi6l4-ph34sug9wmnom07 (humgen-03-02.internal.sanger.ac.uk:25107, disk): done
2016/06/19 18:15:14 collections: 50000/3273174
2016/06/19 18:16:29 collections: 75000/3273174
2016/06/19 18:17:48 collections: 100000/3273174
2016/06/19 18:19:07 collections: 125000/3273174
2016/06/19 18:20:25 collections: 150000/3273174
2016/06/19 18:21:45 collections: 175000/3273174
2016/06/19 18:23:02 collections: 200000/3273174
2016/06/19 18:24:26 collections: 225000/3273174
2016/06/19 18:25:45 collections: 250000/3273174
2016/06/19 18:27:03 collections: 275000/3273174
2016/06/19 18:28:20 collections: 300000/3273174
2016/06/19 18:29:35 collections: 324238/3273174
2016/06/19 18:30:58 collections: 343038/3273174
2016/06/19 18:32:06 collections: 368038/3273174
2016/06/19 18:33:28 collections: 393038/3273174
2016/06/19 18:34:49 collections: 418038/3273174
2016/06/19 18:36:04 collections: 443038/3273174
2016/06/19 18:37:28 collections: 468038/3273174
2016/06/19 18:38:42 collections: 493038/3273174
2016/06/19 18:40:01 collections: 518038/3273174
2016/06/19 18:41:24 collections: 543038/3273174
2016/06/19 18:42:44 collections: 568038/3273174
2016/06/19 18:44:03 collections: 593038/3273174
2016/06/19 18:45:24 collections: 618038/3273174
2016/06/19 18:46:43 collections: 643038/3273174
2016/06/19 18:48:01 collections: 668038/3273174
2016/06/19 18:49:19 collections: 693038/3273174
2016/06/19 18:50:39 collections: 718038/3273174
2016/06/19 18:51:57 collections: 743038/3273174
2016/06/19 18:53:14 collections: 768038/3273174
2016/06/19 18:54:35 collections: 793038/3273174
2016/06/19 18:55:53 collections: 818038/3273174
2016/06/19 18:57:13 collections: 843038/3273174
2016/06/19 18:58:30 collections: 868038/3273174
2016/06/19 18:59:50 collections: 893038/3273174
2016/06/19 19:01:15 collections: 918038/3273174
2016/06/19 19:02:31 collections: 943038/3273174
2016/06/19 19:03:52 collections: 968038/3273174
2016/06/19 19:05:10 collections: 993038/3273174
2016/06/19 19:06:32 collections: 1018038/3273174
2016/06/19 19:07:48 collections: 1043038/3273174
2016/06/19 19:09:09 collections: 1068038/3273174
2016/06/19 19:10:28 collections: 1093038/3273174
2016/06/19 19:11:53 collections: 1118038/3273174
2016/06/19 19:13:11 collections: 1143038/3273174
2016/06/19 19:14:31 collections: 1168038/3273174
2016/06/19 19:15:51 collections: 1193038/3273174
2016/06/19 19:17:10 collections: 1218038/3273174
2016/06/19 19:18:31 collections: 1243038/3273174
2016/06/19 19:19:58 collections: 1268038/3273174
2016/06/19 19:21:20 collections: 1293038/3273174
2016/06/19 19:22:40 collections: 1318038/3273174
2016/06/19 19:23:56 collections: 1343038/3273174
2016/06/19 19:25:16 collections: 1368038/3273174
2016/06/19 19:26:38 collections: 1393038/3273174
2016/06/19 19:27:58 collections: 1418038/3273174
2016/06/19 19:29:16 collections: 1443038/3273174
2016/06/19 19:30:32 collections: 1468038/3273174
2016/06/19 19:31:51 collections: 1493038/3273174
2016/06/19 19:33:08 collections: 1518038/3273174
2016/06/19 19:34:29 collections: 1543038/3273174
2016/06/19 19:35:46 collections: 1568038/3273174
2016/06/19 19:37:07 collections: 1593038/3273174
2016/06/19 19:38:28 collections: 1618038/3273174
2016/06/19 19:39:46 collections: 1643038/3273174
2016/06/19 19:41:03 collections: 1668038/3273174
2016/06/19 19:42:29 collections: 1693038/3273174
2016/06/19 19:43:45 collections: 1718038/3273174
2016/06/19 19:45:03 collections: 1743038/3273174
2016/06/19 19:46:23 collections: 1768038/3273174
2016/06/19 19:47:44 collections: 1793038/3273174
2016/06/19 19:49:00 collections: 1818038/3273174
2016/06/19 19:50:18 collections: 1843038/3273174
2016/06/19 19:51:37 collections: 1868038/3273174
2016/06/19 19:52:58 collections: 1893038/3273174
2016/06/19 19:54:18 collections: 1918038/3273174
2016/06/19 19:55:39 collections: 1943038/3273174
2016/06/19 19:56:56 collections: 1968038/3273174
2016/06/19 19:58:13 collections: 1993038/3273174
2016/06/19 19:59:39 collections: 2018038/3273174
2016/06/19 20:01:06 collections: 2043038/3273174
2016/06/19 20:02:26 collections: 2068038/3273174
2016/06/19 20:03:47 collections: 2093038/3273174
2016/06/19 20:05:07 collections: 2118038/3273174
2016/06/19 20:06:30 collections: 2143038/3273174
2016/06/19 20:07:50 collections: 2168038/3273174
2016/06/19 20:09:07 collections: 2193038/3273174
2016/06/19 20:10:24 collections: 2218038/3273174
2016/06/19 20:11:41 collections: 2243038/3273174
2016/06/19 20:12:57 collections: 2268038/3273174
2016/06/19 20:14:20 collections: 2293038/3273174
2016/06/19 20:15:41 collections: 2318038/3273174
2016/06/19 20:16:58 collections: 2343038/3273174
2016/06/19 20:18:14 collections: 2368038/3273174
2016/06/19 20:19:30 collections: 2393038/3273174
2016/06/19 20:20:48 collections: 2418038/3273174
2016/06/19 20:22:10 collections: 2443038/3273174
2016/06/19 20:23:28 collections: 2468038/3273174
2016/06/19 20:24:44 collections: 2493038/3273174
2016/06/19 20:26:05 collections: 2518038/3273174
2016/06/19 20:27:28 collections: 2543038/3273174
2016/06/19 20:28:49 collections: 2568038/3273174
2016/06/19 20:30:09 collections: 2593038/3273174
2016/06/19 20:31:28 collections: 2618038/3273174
2016/06/19 20:32:47 collections: 2643038/3273174
2016/06/19 20:34:10 collections: 2668038/3273174
2016/06/19 20:35:27 collections: 2693038/3273174
2016/06/19 20:36:44 collections: 2718038/3273174
2016/06/19 20:38:06 collections: 2743038/3273174
2016/06/19 20:39:24 collections: 2768038/3273174
2016/06/19 20:40:43 collections: 2793038/3273174
2016/06/19 20:42:03 collections: 2818038/3273174
2016/06/19 20:43:23 collections: 2843038/3273174
2016/06/19 20:44:42 collections: 2868038/3273174
2016/06/19 20:46:01 collections: 2893038/3273174
2016/06/19 20:47:21 collections: 2918038/3273174
2016/06/19 20:48:40 collections: 2943038/3273174
2016/06/19 20:49:58 collections: 2968038/3273174
2016/06/19 20:51:15 collections: 2993038/3273174
2016/06/19 20:52:35 collections: 3018038/3273174
2016/06/19 20:53:52 collections: 3043038/3273174
2016/06/19 20:55:10 collections: 3068038/3273174
2016/06/19 20:56:34 collections: 3093038/3273174
2016/06/19 20:57:54 collections: 3118038/3273174
2016/06/19 20:59:11 collections: 3143038/3273174
2016/06/19 21:00:30 collections: 3168038/3273174
2016/06/19 21:01:51 collections: 3193038/3273174
2016/06/19 21:03:13 collections: 3218038/3273174
2016/06/19 21:04:29 collections: 3243038/3273174
2016/06/19 21:05:49 collections: 3268038/3273174
2016/06/19 21:06:08 collections: 3273174/3273174
2016/06/19 21:06:08 collections: 3273174/3273174
2016/06/19 21:07:30 GetCurrentState: took 2h54m51.880401008s
2016/06/19 21:07:30 ComputeChangeSets: start
2016/06/19 21:08:47 ComputeChangeSets: took 1m16.25848663s
2016/06/19 21:08:50 ===
2016/06/19 21:08:50 12 replicas (6 blocks, 402731660 bytes) lost (0=have<want)
2016/06/19 21:08:50 1231 replicas (163 blocks, 80390127068 bytes) underreplicated (0<have<want)
2016/06/19 21:08:50 1141920 replicas (570080 blocks, 71769882700386 bytes) just right (have=want)
2016/06/19 21:08:50 4737089 replicas (4728845 blocks, 230577300632292 bytes) overreplicated (have>want>0)
2016/06/19 21:08:50 22488 replicas (7556 blocks, 2070895563 bytes) unreferenced (have>want=0, new)
2016/06/19 21:08:50 1173842 replicas (694038 blocks, 600102826333 bytes) garbage (have>want=0, old)
2016/06/19 21:08:50 ===
2016/06/19 21:08:50 10601328 replicas (5299094 blocks, 531960816569846 bytes) total commitment (excluding unreferenced)
2016/06/19 21:08:50 16533504 replicas (6000682 blocks, 763059498065306 bytes) total usage
2016/06/19 21:08:50 ===
2016/06/19 21:08:50 z8ta6-bi6l4-w3rpndae62qwwre (humgen-02-02.internal.sanger.ac.uk:25107, disk): ChangeSet{Pulls:1522, Trashes:22940}
2016/06/19 21:08:50 z8ta6-bi6l4-3kqkr5lgow2uogm (humgen-03-01.internal.sanger.ac.uk:25107, disk): ChangeSet{Pulls:4553, Trashes:23383}
2016/06/19 21:08:50 z8ta6-bi6l4-ph34sug9wmnom07 (humgen-03-02.internal.sanger.ac.uk:25107, disk): ChangeSet{Pulls:1712, Trashes:484744}
2016/06/19 21:08:50 z8ta6-bi6l4-stmnte9yvd2gh6o (humgen-04-01.internal.sanger.ac.uk:25107, disk): ChangeSet{Pulls:1641, Trashes:23197}
2016/06/19 21:08:50 z8ta6-bi6l4-4b0e02ad7mk84ye (humgen-01-01.internal.sanger.ac.uk:25107, disk): ChangeSet{Pulls:1563, Trashes:23204}
2016/06/19 21:08:50 z8ta6-bi6l4-nynctbmdi8nj6v0 (humgen-01-03.internal.sanger.ac.uk:25107, disk): ChangeSet{Pulls:14318, Trashes:20422}
2016/06/19 21:08:50 z8ta6-bi6l4-sg7xxak114gh1j0 (humgen-02-03.internal.sanger.ac.uk:25107, disk): ChangeSet{Pulls:1511, Trashes:23447}
2016/06/19 21:08:50 z8ta6-bi6l4-a1pntf0wx8vfr5v (humgen-03-03.internal.sanger.ac.uk:25107, disk): ChangeSet{Pulls:1495, Trashes:23451}
2016/06/19 21:08:50 z8ta6-bi6l4-lhps1yuzszk0315 (humgen-04-02.internal.sanger.ac.uk:25107, disk): ChangeSet{Pulls:1548, Trashes:23349}
2016/06/19 21:08:50 z8ta6-bi6l4-az89xled1ycwnpb (humgen-04-03.internal.sanger.ac.uk:25107, disk): ChangeSet{Pulls:1476, Trashes:23403}
2016/06/19 21:08:50 z8ta6-bi6l4-kijrzcy3zkflg3s (humgen-01-02.internal.sanger.ac.uk:25107, disk): ChangeSet{Pulls:36345, Trashes:85461}
2016/06/19 21:08:50 z8ta6-bi6l4-yxhkoekmnv5czf3 (humgen-02-01.internal.sanger.ac.uk:25107, disk): ChangeSet{Pulls:2940, Trashes:23266}
2016/06/19 21:08:50 ===
2016/06/19 21:08:50 Replication level distribution (counting N replicas on a single server as N):
2016/06/19 21:08:50  0:       6 #######
2016/06/19 21:08:50  1:  390568 ##################################################
2016/06/19 21:08:50  2:  697270 ####################################################
2016/06/19 21:08:50  3: 4905403 ###########################################################
2016/06/19 21:08:50  4:    6282 ##################################
2016/06/19 21:08:50  5:     952 ##########################
2016/06/19 21:08:50  6:      30 #############
2016/06/19 21:08:50  7:       1 ##
2016/06/19 21:08:50  8:       0
2016/06/19 21:08:50  9:       0
2016/06/19 21:08:50 10:       0
2016/06/19 21:08:50 11:       0
2016/06/19 21:08:50 12:     176 ####################
2016/06/19 21:08:50 ===
2016/06/19 21:08:51 z8ta6-bi6l4-az89xled1ycwnpb (humgen-04-03.internal.sanger.ac.uk:25107, disk): send pull list: start
2016/06/19 21:08:51 z8ta6-bi6l4-ph34sug9wmnom07 (humgen-03-02.internal.sanger.ac.uk:25107, disk): send pull list: start
2016/06/19 21:08:51 z8ta6-bi6l4-kijrzcy3zkflg3s (humgen-01-02.internal.sanger.ac.uk:25107, disk): send pull list: start
2016/06/19 21:08:51 z8ta6-bi6l4-3kqkr5lgow2uogm (humgen-03-01.internal.sanger.ac.uk:25107, disk): send pull list: start
2016/06/19 21:08:51 z8ta6-bi6l4-nynctbmdi8nj6v0 (humgen-01-03.internal.sanger.ac.uk:25107, disk): send pull list: start
2016/06/19 21:08:51 z8ta6-bi6l4-4b0e02ad7mk84ye (humgen-01-01.internal.sanger.ac.uk:25107, disk): send pull list: start
2016/06/19 21:08:51 z8ta6-bi6l4-w3rpndae62qwwre (humgen-02-02.internal.sanger.ac.uk:25107, disk): send pull list: start
2016/06/19 21:08:51 z8ta6-bi6l4-stmnte9yvd2gh6o (humgen-04-01.internal.sanger.ac.uk:25107, disk): send pull list: start
2016/06/19 21:08:51 z8ta6-bi6l4-yxhkoekmnv5czf3 (humgen-02-01.internal.sanger.ac.uk:25107, disk): send pull list: start
2016/06/19 21:08:51 z8ta6-bi6l4-lhps1yuzszk0315 (humgen-04-02.internal.sanger.ac.uk:25107, disk): send pull list: start
2016/06/19 21:08:51 z8ta6-bi6l4-a1pntf0wx8vfr5v (humgen-03-03.internal.sanger.ac.uk:25107, disk): send pull list: start
2016/06/19 21:08:51 z8ta6-bi6l4-sg7xxak114gh1j0 (humgen-02-03.internal.sanger.ac.uk:25107, disk): send pull list: start
2016/06/19 21:08:51 z8ta6-bi6l4-stmnte9yvd2gh6o (humgen-04-01.internal.sanger.ac.uk:25107, disk): send pull list: took 20.708728ms
2016/06/19 21:08:51 z8ta6-bi6l4-az89xled1ycwnpb (humgen-04-03.internal.sanger.ac.uk:25107, disk): send pull list: took 22.366841ms
2016/06/19 21:08:51 z8ta6-bi6l4-a1pntf0wx8vfr5v (humgen-03-03.internal.sanger.ac.uk:25107, disk): send pull list: took 22.779635ms
2016/06/19 21:08:51 z8ta6-bi6l4-w3rpndae62qwwre (humgen-02-02.internal.sanger.ac.uk:25107, disk): send pull list: took 23.293015ms
2016/06/19 21:08:51 z8ta6-bi6l4-sg7xxak114gh1j0 (humgen-02-03.internal.sanger.ac.uk:25107, disk): send pull list: took 23.41336ms
2016/06/19 21:08:51 z8ta6-bi6l4-4b0e02ad7mk84ye (humgen-01-01.internal.sanger.ac.uk:25107, disk): send pull list: took 24.025641ms
2016/06/19 21:08:51 z8ta6-bi6l4-lhps1yuzszk0315 (humgen-04-02.internal.sanger.ac.uk:25107, disk): send pull list: took 24.347904ms
2016/06/19 21:08:51 z8ta6-bi6l4-ph34sug9wmnom07 (humgen-03-02.internal.sanger.ac.uk:25107, disk): send pull list: took 25.521873ms
2016/06/19 21:08:51 z8ta6-bi6l4-yxhkoekmnv5czf3 (humgen-02-01.internal.sanger.ac.uk:25107, disk): send pull list: took 34.62655ms
2016/06/19 21:08:51 z8ta6-bi6l4-3kqkr5lgow2uogm (humgen-03-01.internal.sanger.ac.uk:25107, disk): send pull list: took 56.198255ms
2016/06/19 21:08:51 z8ta6-bi6l4-nynctbmdi8nj6v0 (humgen-01-03.internal.sanger.ac.uk:25107, disk): send pull list: took 164.169435ms
2016/06/19 21:08:51 z8ta6-bi6l4-kijrzcy3zkflg3s (humgen-01-02.internal.sanger.ac.uk:25107, disk): send pull list: took 368.224656ms
2016/06/19 21:08:51 z8ta6-bi6l4-a1pntf0wx8vfr5v (humgen-03-03.internal.sanger.ac.uk:25107, disk): send trash list: start
2016/06/19 21:08:51 z8ta6-bi6l4-3kqkr5lgow2uogm (humgen-03-01.internal.sanger.ac.uk:25107, disk): send trash list: start
2016/06/19 21:08:51 z8ta6-bi6l4-4b0e02ad7mk84ye (humgen-01-01.internal.sanger.ac.uk:25107, disk): send trash list: start
2016/06/19 21:08:51 z8ta6-bi6l4-ph34sug9wmnom07 (humgen-03-02.internal.sanger.ac.uk:25107, disk): send trash list: start
2016/06/19 21:08:51 z8ta6-bi6l4-nynctbmdi8nj6v0 (humgen-01-03.internal.sanger.ac.uk:25107, disk): send trash list: start
2016/06/19 21:08:51 z8ta6-bi6l4-w3rpndae62qwwre (humgen-02-02.internal.sanger.ac.uk:25107, disk): send trash list: start
2016/06/19 21:08:51 z8ta6-bi6l4-kijrzcy3zkflg3s (humgen-01-02.internal.sanger.ac.uk:25107, disk): send trash list: start
2016/06/19 21:08:51 z8ta6-bi6l4-yxhkoekmnv5czf3 (humgen-02-01.internal.sanger.ac.uk:25107, disk): send trash list: start
2016/06/19 21:08:51 z8ta6-bi6l4-lhps1yuzszk0315 (humgen-04-02.internal.sanger.ac.uk:25107, disk): send trash list: start
2016/06/19 21:08:51 z8ta6-bi6l4-stmnte9yvd2gh6o (humgen-04-01.internal.sanger.ac.uk:25107, disk): send trash list: start
2016/06/19 21:08:51 z8ta6-bi6l4-az89xled1ycwnpb (humgen-04-03.internal.sanger.ac.uk:25107, disk): send trash list: start
2016/06/19 21:08:51 z8ta6-bi6l4-sg7xxak114gh1j0 (humgen-02-03.internal.sanger.ac.uk:25107, disk): send trash list: start
2016/06/19 21:08:51 z8ta6-bi6l4-stmnte9yvd2gh6o (humgen-04-01.internal.sanger.ac.uk:25107, disk): send trash list: took 156.769067ms
2016/06/19 21:08:51 z8ta6-bi6l4-az89xled1ycwnpb (humgen-04-03.internal.sanger.ac.uk:25107, disk): send trash list: took 159.902483ms
2016/06/19 21:08:51 z8ta6-bi6l4-nynctbmdi8nj6v0 (humgen-01-03.internal.sanger.ac.uk:25107, disk): send trash list: took 161.278046ms
2016/06/19 21:08:51 z8ta6-bi6l4-yxhkoekmnv5czf3 (humgen-02-01.internal.sanger.ac.uk:25107, disk): send trash list: took 165.309128ms
2016/06/19 21:08:51 z8ta6-bi6l4-sg7xxak114gh1j0 (humgen-02-03.internal.sanger.ac.uk:25107, disk): send trash list: took 166.661732ms
2016/06/19 21:08:51 z8ta6-bi6l4-a1pntf0wx8vfr5v (humgen-03-03.internal.sanger.ac.uk:25107, disk): send trash list: took 167.588291ms
2016/06/19 21:08:51 z8ta6-bi6l4-w3rpndae62qwwre (humgen-02-02.internal.sanger.ac.uk:25107, disk): send trash list: took 167.558003ms
2016/06/19 21:08:51 z8ta6-bi6l4-lhps1yuzszk0315 (humgen-04-02.internal.sanger.ac.uk:25107, disk): send trash list: took 169.310077ms
2016/06/19 21:08:51 z8ta6-bi6l4-3kqkr5lgow2uogm (humgen-03-01.internal.sanger.ac.uk:25107, disk): send trash list: took 179.385898ms
2016/06/19 21:08:51 z8ta6-bi6l4-4b0e02ad7mk84ye (humgen-01-01.internal.sanger.ac.uk:25107, disk): send trash list: took 185.283856ms
2016/06/19 21:08:54 z8ta6-bi6l4-ph34sug9wmnom07 (humgen-03-02.internal.sanger.ac.uk:25107, disk): send trash list: took 3.100227954s

^C

Actions #2

Updated by Tom Clegg almost 8 years ago

I wonder if
  • "pull: EOF" means we just crashed keepstore (see #9437; I think the crashing bug is racy, in that keepstore doesn't necessarily crash during the "send pull list" request -- only when it tries to process the list)
  • there's no timeout on the "send trash list" http request, and there's some unlucky timing opportunity during a keepstore crash/restart that can leave keep-balance's http client waiting for a response that will never come.
Actions #3

Updated by Tom Clegg almost 8 years ago

https://golang.org/pkg/net/http/#Client

        // Timeout specifies a time limit for requests made by this
        // Client. The timeout includes connection time, any
        // redirects, and reading the response body. The timer remains
        // running after Get, Head, Post, or Do return and will
        // interrupt reading of the Response.Body.
        //
        // A Timeout of zero means no timeout.
Actions #4

Updated by Brett Smith almost 8 years ago

  • Assigned To set to Lucas Di Pentima
  • Target version set to 2016-07-20 sprint
  • Story points set to 1.0
  • Verify whether or not the client library is sending a timeout. One customization route sets a default timeout; another sets none. Figure out which path keep-balance is using.
  • If the code is not currently setting a timeout, do so. This is a good idea no matter what.
  • We can deploy and test that version to see whether or not it's sufficient to resolve the issue.
Actions #5

Updated by Tom Clegg almost 8 years ago

Suggest in source:sdk/go/arvados/:
  • add a DefaultSecureClient var instead of using http.DefaultClient (it would be rude for a library to alter http.DefaultClient)
  • use 5 minute timeout in both of the default clients
Actions #6

Updated by Lucas Di Pentima almost 8 years ago

  • Status changed from New to In Progress
Actions #7

Updated by Lucas Di Pentima almost 8 years ago

42490568db4cf4bc65fc436b41cfcffb8eadd8d1

Added a new var DefaultSecureClient and assigned a 5 minute timeout to both InsecureHTTPClient and DefaultSecureClient

Actions #8

Updated by Tom Clegg almost 8 years ago

You only need 5 * time.Minute here, not time.Duration(5 * time.Minute) -- multiplying by a time.Duration always gives you a time.Duration, so it's redundant to cast it.

5 is an untyped constant here, so it gets cast as a time.Duration if you use it that way.

_ = 5 * time.Minute   // OK

i := 5
_ = i * time.Minute   // invalid operation: i * time.Minute (mismatched types int and time.Duration)

With that, LGTM.

Actions #9

Updated by Lucas Di Pentima almost 8 years ago

df75e59337899044a55767e10f69717b520c3ead

Eliminated redundant casting. Thanks!

Actions #11

Updated by Tom Clegg almost 8 years ago

  • Target version deleted (2016-07-20 sprint)
Actions #12

Updated by Tom Clegg almost 8 years ago

  • Status changed from In Progress to Feedback
Actions #13

Updated by Brett Smith almost 8 years ago

Josh,

We've merged a patch that addresses the clearest and likeliest explanation for a hang like this. If you're able to test out the current version and tell us whether or not you still see the issue, we'd be very interested to hear what you find either way.

Actions #14

Updated by Lucas Di Pentima almost 7 years ago

  • Status changed from Feedback to Resolved
Actions

Also available in: Atom PDF