Project

General

Profile

Actions

Bug #9438

closed

keep-balance hung and needed ctrl-c

Added by Joshua Randall almost 8 years ago. Updated almost 7 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
Keep
Target version:
-
Story points:
1.0

Description

I started keep-balance on friday evening without the `-once` flag and left it running. On Sunday morning I found it hung, with the last message it had printed from the night before, which seems to have occurred just after a "run failed" error:

2016/06/18 22:02:38 run failed: z8ta6-bi6l4-w3rpndae62qwwre (humgen-02-02.internal.sanger.ac.uk:25107, disk): send pull list: Put http://humgen-02-02.internal.
sanger.ac.uk:25107/pull: EOF

keep-balance was started on Friday as:

root@humgen-01-01:~# keep-balance -config /root/keep-balance.json -commit-pulls -commit-trash
2016/06/17 21:03:59 starting up: will scan every 10m0s and on SIGUSR1
2016/06/17 21:03:59 Run: start
...

Last run where it got stuck:

2016/06/18 22:01:06 GetCurrentState: took 3h32m15.109080064s
2016/06/18 22:01:06 ComputeChangeSets: start
2016/06/18 22:02:33 ComputeChangeSets: took 1m27.424860059s
2016/06/18 22:02:37 ===
2016/06/18 22:02:37 12 replicas (6 blocks, 402731660 bytes) lost (0=have<want)
2016/06/18 22:02:37 1231 replicas (163 blocks, 80390127068 bytes) underreplicated (0<have<want)
2016/06/18 22:02:37 1141920 replicas (570080 blocks, 71769882700386 bytes) just right (have=want)
2016/06/18 22:02:37 4737089 replicas (4728845 blocks, 230577300632292 bytes) overreplicated (have>want>0)
2016/06/18 22:02:37 108568 replicas (36690 blocks, 2094872678 bytes) unreferenced (have>want=0, new)
2016/06/18 22:02:37 1087762 replicas (664904 blocks, 600078849218 bytes) garbage (have>want=0, old)
2016/06/18 22:02:37 ===
2016/06/18 22:02:37 10601328 replicas (5299094 blocks, 531960816569846 bytes) total commitment (excluding unreferenced)
2016/06/18 22:02:37 16533504 replicas (6000682 blocks, 763059498065306 bytes) total usage
2016/06/18 22:02:37 ===
2016/06/18 22:02:37 z8ta6-bi6l4-4b0e02ad7mk84ye (humgen-01-01.internal.sanger.ac.uk:25107, disk): ChangeSet{Pulls:1563, Trashes:20713}
2016/06/18 22:02:37 z8ta6-bi6l4-3kqkr5lgow2uogm (humgen-03-01.internal.sanger.ac.uk:25107, disk): ChangeSet{Pulls:4553, Trashes:20802}
2016/06/18 22:02:37 z8ta6-bi6l4-a1pntf0wx8vfr5v (humgen-03-03.internal.sanger.ac.uk:25107, disk): ChangeSet{Pulls:1495, Trashes:20899}
2016/06/18 22:02:37 z8ta6-bi6l4-stmnte9yvd2gh6o (humgen-04-01.internal.sanger.ac.uk:25107, disk): ChangeSet{Pulls:1641, Trashes:20722}
2016/06/18 22:02:37 z8ta6-bi6l4-az89xled1ycwnpb (humgen-04-03.internal.sanger.ac.uk:25107, disk): ChangeSet{Pulls:1476, Trashes:20927}
2016/06/18 22:02:37 z8ta6-bi6l4-kijrzcy3zkflg3s (humgen-01-02.internal.sanger.ac.uk:25107, disk): ChangeSet{Pulls:36345, Trashes:82968}
2016/06/18 22:02:37 z8ta6-bi6l4-nynctbmdi8nj6v0 (humgen-01-03.internal.sanger.ac.uk:25107, disk): ChangeSet{Pulls:14318, Trashes:17942}
2016/06/18 22:02:37 z8ta6-bi6l4-yxhkoekmnv5czf3 (humgen-02-01.internal.sanger.ac.uk:25107, disk): ChangeSet{Pulls:2940, Trashes:20789}
2016/06/18 22:02:37 z8ta6-bi6l4-w3rpndae62qwwre (humgen-02-02.internal.sanger.ac.uk:25107, disk): ChangeSet{Pulls:1522, Trashes:20516}
2016/06/18 22:02:37 z8ta6-bi6l4-sg7xxak114gh1j0 (humgen-02-03.internal.sanger.ac.uk:25107, disk): ChangeSet{Pulls:1511, Trashes:20965}
2016/06/18 22:02:37 z8ta6-bi6l4-ph34sug9wmnom07 (humgen-03-02.internal.sanger.ac.uk:25107, disk): ChangeSet{Pulls:1712, Trashes:482300}
2016/06/18 22:02:37 z8ta6-bi6l4-lhps1yuzszk0315 (humgen-04-02.internal.sanger.ac.uk:25107, disk): ChangeSet{Pulls:1548, Trashes:20863}
2016/06/18 22:02:37 ===
2016/06/18 22:02:37 Replication level distribution (counting N replicas on a single server as N):
2016/06/18 22:02:37  0:       6 #######
2016/06/18 22:02:37  1:  390568 ##################################################
2016/06/18 22:02:37  2:  697270 ####################################################
2016/06/18 22:02:37  3: 4905403 ###########################################################
2016/06/18 22:02:37  4:    6282 ##################################
2016/06/18 22:02:37  5:     952 ##########################
2016/06/18 22:02:37  6:      30 #############
2016/06/18 22:02:37  7:       1 ##
2016/06/18 22:02:37  8:       0
2016/06/18 22:02:37  9:       0
2016/06/18 22:02:37 10:       0
2016/06/18 22:02:37 11:       0
2016/06/18 22:02:37 12:     176 ####################
2016/06/18 22:02:37 ===
2016/06/18 22:02:37 z8ta6-bi6l4-kijrzcy3zkflg3s (humgen-01-02.internal.sanger.ac.uk:25107, disk): send pull list: start
2016/06/18 22:02:37 z8ta6-bi6l4-a1pntf0wx8vfr5v (humgen-03-03.internal.sanger.ac.uk:25107, disk): send pull list: start
2016/06/18 22:02:37 z8ta6-bi6l4-w3rpndae62qwwre (humgen-02-02.internal.sanger.ac.uk:25107, disk): send pull list: start
2016/06/18 22:02:37 z8ta6-bi6l4-yxhkoekmnv5czf3 (humgen-02-01.internal.sanger.ac.uk:25107, disk): send pull list: start
2016/06/18 22:02:37 z8ta6-bi6l4-ph34sug9wmnom07 (humgen-03-02.internal.sanger.ac.uk:25107, disk): send pull list: start
2016/06/18 22:02:37 z8ta6-bi6l4-az89xled1ycwnpb (humgen-04-03.internal.sanger.ac.uk:25107, disk): send pull list: start
2016/06/18 22:02:37 z8ta6-bi6l4-sg7xxak114gh1j0 (humgen-02-03.internal.sanger.ac.uk:25107, disk): send pull list: start
2016/06/18 22:02:37 z8ta6-bi6l4-lhps1yuzszk0315 (humgen-04-02.internal.sanger.ac.uk:25107, disk): send pull list: start
2016/06/18 22:02:37 z8ta6-bi6l4-nynctbmdi8nj6v0 (humgen-01-03.internal.sanger.ac.uk:25107, disk): send pull list: start
2016/06/18 22:02:37 z8ta6-bi6l4-4b0e02ad7mk84ye (humgen-01-01.internal.sanger.ac.uk:25107, disk): send pull list: start
2016/06/18 22:02:37 z8ta6-bi6l4-3kqkr5lgow2uogm (humgen-03-01.internal.sanger.ac.uk:25107, disk): send pull list: start
2016/06/18 22:02:37 z8ta6-bi6l4-stmnte9yvd2gh6o (humgen-04-01.internal.sanger.ac.uk:25107, disk): send pull list: start
2016/06/18 22:02:37 z8ta6-bi6l4-a1pntf0wx8vfr5v (humgen-03-03.internal.sanger.ac.uk:25107, disk): send pull list: took 19.416031ms
2016/06/18 22:02:37 z8ta6-bi6l4-lhps1yuzszk0315 (humgen-04-02.internal.sanger.ac.uk:25107, disk): send pull list: took 19.662912ms
2016/06/18 22:02:37 z8ta6-bi6l4-4b0e02ad7mk84ye (humgen-01-01.internal.sanger.ac.uk:25107, disk): send pull list: took 20.120161ms
2016/06/18 22:02:37 z8ta6-bi6l4-ph34sug9wmnom07 (humgen-03-02.internal.sanger.ac.uk:25107, disk): send pull list: took 21.278342ms
2016/06/18 22:02:37 z8ta6-bi6l4-az89xled1ycwnpb (humgen-04-03.internal.sanger.ac.uk:25107, disk): send pull list: took 21.672833ms
2016/06/18 22:02:37 z8ta6-bi6l4-sg7xxak114gh1j0 (humgen-02-03.internal.sanger.ac.uk:25107, disk): send pull list: took 22.979431ms
2016/06/18 22:02:37 z8ta6-bi6l4-w3rpndae62qwwre (humgen-02-02.internal.sanger.ac.uk:25107, disk): send pull list: took 23.552313ms
2016/06/18 22:02:37 z8ta6-bi6l4-w3rpndae62qwwre (humgen-02-02.internal.sanger.ac.uk:25107, disk): send pull list: Put http://humgen-02-02.internal.sanger.ac.uk
:25107/pull: EOF
2016/06/18 22:02:37 z8ta6-bi6l4-stmnte9yvd2gh6o (humgen-04-01.internal.sanger.ac.uk:25107, disk): send pull list: took 26.28401ms
2016/06/18 22:02:38 z8ta6-bi6l4-yxhkoekmnv5czf3 (humgen-02-01.internal.sanger.ac.uk:25107, disk): send pull list: took 39.027898ms
2016/06/18 22:02:38 z8ta6-bi6l4-3kqkr5lgow2uogm (humgen-03-01.internal.sanger.ac.uk:25107, disk): send pull list: took 57.850175ms
2016/06/18 22:02:38 z8ta6-bi6l4-nynctbmdi8nj6v0 (humgen-01-03.internal.sanger.ac.uk:25107, disk): send pull list: took 148.006551ms
2016/06/18 22:02:38 z8ta6-bi6l4-kijrzcy3zkflg3s (humgen-01-02.internal.sanger.ac.uk:25107, disk): send pull list: took 380.793057ms
2016/06/18 22:02:38 Run: took 3h33m47.501669257s
2016/06/18 22:02:38 run failed: z8ta6-bi6l4-w3rpndae62qwwre (humgen-02-02.internal.sanger.ac.uk:25107, disk): send pull list: Put http://humgen-02-02.internal.
sanger.ac.uk:25107/pull: EOF
2016/06/18 22:02:38 timer went off
2016/06/18 22:02:38 starting next run
2016/06/18 22:02:38 Run: start
2016/06/18 22:02:38 z8ta6-bi6l4-a1pntf0wx8vfr5v (humgen-03-03.internal.sanger.ac.uk:25107, disk): send trash list: start
2016/06/18 22:02:38 z8ta6-bi6l4-w3rpndae62qwwre (humgen-02-02.internal.sanger.ac.uk:25107, disk): send trash list: start
2016/06/18 22:02:38 z8ta6-bi6l4-az89xled1ycwnpb (humgen-04-03.internal.sanger.ac.uk:25107, disk): send trash list: start
2016/06/18 22:02:38 z8ta6-bi6l4-nynctbmdi8nj6v0 (humgen-01-03.internal.sanger.ac.uk:25107, disk): send trash list: start
2016/06/18 22:02:38 z8ta6-bi6l4-yxhkoekmnv5czf3 (humgen-02-01.internal.sanger.ac.uk:25107, disk): send trash list: start
2016/06/18 22:02:38 z8ta6-bi6l4-lhps1yuzszk0315 (humgen-04-02.internal.sanger.ac.uk:25107, disk): send trash list: start
2016/06/18 22:02:38 z8ta6-bi6l4-stmnte9yvd2gh6o (humgen-04-01.internal.sanger.ac.uk:25107, disk): send trash list: start
2016/06/18 22:02:38 z8ta6-bi6l4-kijrzcy3zkflg3s (humgen-01-02.internal.sanger.ac.uk:25107, disk): send trash list: start
2016/06/18 22:02:38 z8ta6-bi6l4-3kqkr5lgow2uogm (humgen-03-01.internal.sanger.ac.uk:25107, disk): send trash list: start
2016/06/18 22:02:38 z8ta6-bi6l4-sg7xxak114gh1j0 (humgen-02-03.internal.sanger.ac.uk:25107, disk): send trash list: start
2016/06/18 22:02:38 z8ta6-bi6l4-4b0e02ad7mk84ye (humgen-01-01.internal.sanger.ac.uk:25107, disk): send trash list: start
2016/06/18 22:02:38 z8ta6-bi6l4-ph34sug9wmnom07 (humgen-03-02.internal.sanger.ac.uk:25107, disk): send trash list: start
2016/06/18 22:02:38 z8ta6-bi6l4-az89xled1ycwnpb (humgen-04-03.internal.sanger.ac.uk:25107, disk): send trash list: took 1.17461ms
2016/06/18 22:02:38 z8ta6-bi6l4-a1pntf0wx8vfr5v (humgen-03-03.internal.sanger.ac.uk:25107, disk): send trash list: took 1.320745ms
2016/06/18 22:02:38 z8ta6-bi6l4-nynctbmdi8nj6v0 (humgen-01-03.internal.sanger.ac.uk:25107, disk): send trash list: took 1.424586ms
2016/06/18 22:02:38 z8ta6-bi6l4-ph34sug9wmnom07 (humgen-03-02.internal.sanger.ac.uk:25107, disk): send trash list: took 1.386748ms
2016/06/18 22:02:38 z8ta6-bi6l4-stmnte9yvd2gh6o (humgen-04-01.internal.sanger.ac.uk:25107, disk): send trash list: took 1.766106ms
2016/06/18 22:02:38 z8ta6-bi6l4-w3rpndae62qwwre (humgen-02-02.internal.sanger.ac.uk:25107, disk): send trash list: took 1.891532ms
2016/06/18 22:02:38 z8ta6-bi6l4-yxhkoekmnv5czf3 (humgen-02-01.internal.sanger.ac.uk:25107, disk): send trash list: took 1.848563ms
2016/06/18 22:02:38 z8ta6-bi6l4-4b0e02ad7mk84ye (humgen-01-01.internal.sanger.ac.uk:25107, disk): send trash list: took 1.866451ms
2016/06/18 22:02:38 z8ta6-bi6l4-sg7xxak114gh1j0 (humgen-02-03.internal.sanger.ac.uk:25107, disk): send trash list: took 1.909948ms
2016/06/18 22:02:38 z8ta6-bi6l4-lhps1yuzszk0315 (humgen-04-02.internal.sanger.ac.uk:25107, disk): send trash list: took 1.986878ms
2016/06/18 22:02:38 z8ta6-bi6l4-3kqkr5lgow2uogm (humgen-03-01.internal.sanger.ac.uk:25107, disk): send trash list: took 1.983197ms
^C
root@humgen-01-01:~# keep-balance -config /root/keep-balance.json -commit-pulls -commit-trash
2016/06/19 12:16:01 starting up: will scan every 10m0s and on SIGUSR1
2016/06/19 12:16:01 Run: start


Subtasks 1 (0 open1 closed)

Task #9562: Review 9438-http-default-timeoutResolvedLucas Di Pentima07/06/2016Actions
Actions

Also available in: Atom PDF