Project

General

Profile

Actions

Bug #9437

closed

keep-balance doesn't seem to be reducing replication level of blocks

Added by Joshua Randall almost 8 years ago. Updated about 7 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
Keep
Target version:
-
Story points:
-

Description

For example, some output from this afternoon shows that in two consecutive runs the number of overreplicated blocks does not change (it seems to be stuck at 4737089, which is the same number it had as of the run last night). I would have expected it would have moved those to the trash and they would no longer be showing up as overreplicated once they are in the trash.

2016/06/19 15:12:46 ===
2016/06/19 15:12:46 12 replicas (6 blocks, 402731660 bytes) lost (0=have<want)
2016/06/19 15:12:46 1231 replicas (163 blocks, 80390127068 bytes) underreplicated (0<have<want)
2016/06/19 15:12:46 1141920 replicas (570080 blocks, 71769882700386 bytes) just right (have=want)
2016/06/19 15:12:46 4737089 replicas (4728845 blocks, 230577300632292 bytes) overreplicated (have>want>0)
2016/06/19 15:12:46 25722 replicas (8648 blocks, 2071796729 bytes) unreferenced (have>want=0, new)
2016/06/19 15:12:46 1170608 replicas (692946 blocks, 600101925167 bytes) garbage (have>want=0, old)
2016/06/19 15:12:46 ===
2016/06/19 15:12:46 10601328 replicas (5299094 blocks, 531960816569846 bytes) total commitment (excluding unreferenced)
2016/06/19 15:12:46 16533504 replicas (6000682 blocks, 763059498065306 bytes) total usage
2016/06/19 15:12:46 ===
2016/06/19 18:12:38 ===
2016/06/19 18:12:38 12 replicas (6 blocks, 402731660 bytes) lost (0=have<want)
2016/06/19 18:12:38 1231 replicas (163 blocks, 80390127068 bytes) underreplicated (0<have<want)
2016/06/19 18:12:38 1141920 replicas (570080 blocks, 71769882700386 bytes) just right (have=want)
2016/06/19 18:12:38 4737089 replicas (4728845 blocks, 230577300632292 bytes) overreplicated (have>want>0)
2016/06/19 18:12:38 25565 replicas (8595 blocks, 2071752841 bytes) unreferenced (have>want=0, new)
2016/06/19 18:12:38 1170765 replicas (692999 blocks, 600101969055 bytes) garbage (have>want=0, old)
2016/06/19 18:12:38 ===
2016/06/19 18:12:38 10601328 replicas (5299094 blocks, 531960816569846 bytes) total commitment (excluding unreferenced)
2016/06/19 18:12:38 16533504 replicas (6000682 blocks, 763059498065306 bytes) total usage
2016/06/19 18:12:38 ===

Here is the full output from that run:

root@humgen-01-01:~# keep-balance -config /root/keep-balance.json -commit-pulls -commit-trash
2016/06/19 12:16:01 starting up: will scan every 10m0s and on SIGUSR1
2016/06/19 12:16:01 Run: start
2016/06/19 12:16:01 z8ta6-bi6l4-ph34sug9wmnom07 (humgen-03-02.internal.sanger.ac.uk:25107, disk): send trash list: start
2016/06/19 12:16:01 z8ta6-bi6l4-4b0e02ad7mk84ye (humgen-01-01.internal.sanger.ac.uk:25107, disk): send trash list: start
2016/06/19 12:16:01 z8ta6-bi6l4-w3rpndae62qwwre (humgen-02-02.internal.sanger.ac.uk:25107, disk): send trash list: start
2016/06/19 12:16:01 z8ta6-bi6l4-nynctbmdi8nj6v0 (humgen-01-03.internal.sanger.ac.uk:25107, disk): send trash list: start
2016/06/19 12:16:01 z8ta6-bi6l4-3kqkr5lgow2uogm (humgen-03-01.internal.sanger.ac.uk:25107, disk): send trash list: start
2016/06/19 12:16:01 z8ta6-bi6l4-kijrzcy3zkflg3s (humgen-01-02.internal.sanger.ac.uk:25107, disk): send trash list: start
2016/06/19 12:16:01 z8ta6-bi6l4-lhps1yuzszk0315 (humgen-04-02.internal.sanger.ac.uk:25107, disk): send trash list: start
2016/06/19 12:16:01 z8ta6-bi6l4-stmnte9yvd2gh6o (humgen-04-01.internal.sanger.ac.uk:25107, disk): send trash list: start
2016/06/19 12:16:01 z8ta6-bi6l4-az89xled1ycwnpb (humgen-04-03.internal.sanger.ac.uk:25107, disk): send trash list: start
2016/06/19 12:16:01 z8ta6-bi6l4-sg7xxak114gh1j0 (humgen-02-03.internal.sanger.ac.uk:25107, disk): send trash list: start
2016/06/19 12:16:01 z8ta6-bi6l4-a1pntf0wx8vfr5v (humgen-03-03.internal.sanger.ac.uk:25107, disk): send trash list: start
2016/06/19 12:16:01 z8ta6-bi6l4-yxhkoekmnv5czf3 (humgen-02-01.internal.sanger.ac.uk:25107, disk): send trash list: start
2016/06/19 12:16:01 z8ta6-bi6l4-nynctbmdi8nj6v0 (humgen-01-03.internal.sanger.ac.uk:25107, disk): send trash list: took 1.459981ms
2016/06/19 12:16:01 z8ta6-bi6l4-w3rpndae62qwwre (humgen-02-02.internal.sanger.ac.uk:25107, disk): send trash list: took 1.036692ms
2016/06/19 12:16:01 z8ta6-bi6l4-ph34sug9wmnom07 (humgen-03-02.internal.sanger.ac.uk:25107, disk): send trash list: took 1.562848ms
2016/06/19 12:16:01 z8ta6-bi6l4-lhps1yuzszk0315 (humgen-04-02.internal.sanger.ac.uk:25107, disk): send trash list: took 1.56567ms
2016/06/19 12:16:01 z8ta6-bi6l4-kijrzcy3zkflg3s (humgen-01-02.internal.sanger.ac.uk:25107, disk): send trash list: took 1.916922ms
2016/06/19 12:16:01 z8ta6-bi6l4-az89xled1ycwnpb (humgen-04-03.internal.sanger.ac.uk:25107, disk): send trash list: took 1.922641ms
2016/06/19 12:16:01 z8ta6-bi6l4-stmnte9yvd2gh6o (humgen-04-01.internal.sanger.ac.uk:25107, disk): send trash list: took 1.953074ms
2016/06/19 12:16:01 z8ta6-bi6l4-4b0e02ad7mk84ye (humgen-01-01.internal.sanger.ac.uk:25107, disk): send trash list: took 1.911003ms
2016/06/19 12:16:01 z8ta6-bi6l4-3kqkr5lgow2uogm (humgen-03-01.internal.sanger.ac.uk:25107, disk): send trash list: took 1.822891ms
2016/06/19 12:16:01 z8ta6-bi6l4-yxhkoekmnv5czf3 (humgen-02-01.internal.sanger.ac.uk:25107, disk): send trash list: took 1.875824ms
2016/06/19 12:16:01 z8ta6-bi6l4-sg7xxak114gh1j0 (humgen-02-03.internal.sanger.ac.uk:25107, disk): send trash list: took 1.937196ms
2016/06/19 12:16:01 z8ta6-bi6l4-a1pntf0wx8vfr5v (humgen-03-03.internal.sanger.ac.uk:25107, disk): send trash list: took 2.899201ms
2016/06/19 12:16:01 GetCurrentState: start
2016/06/19 12:16:02 z8ta6-bi6l4-sg7xxak114gh1j0 (humgen-02-03.internal.sanger.ac.uk:25107, disk): retrieve index
2016/06/19 12:16:02 z8ta6-bi6l4-w3rpndae62qwwre (humgen-02-02.internal.sanger.ac.uk:25107, disk): retrieve index
2016/06/19 12:16:02 z8ta6-bi6l4-yxhkoekmnv5czf3 (humgen-02-01.internal.sanger.ac.uk:25107, disk): retrieve index
2016/06/19 12:16:02 z8ta6-bi6l4-a1pntf0wx8vfr5v (humgen-03-03.internal.sanger.ac.uk:25107, disk): retrieve index
2016/06/19 12:16:02 z8ta6-bi6l4-4b0e02ad7mk84ye (humgen-01-01.internal.sanger.ac.uk:25107, disk): retrieve index
2016/06/19 12:16:02 z8ta6-bi6l4-stmnte9yvd2gh6o (humgen-04-01.internal.sanger.ac.uk:25107, disk): retrieve index
2016/06/19 12:16:02 z8ta6-bi6l4-ph34sug9wmnom07 (humgen-03-02.internal.sanger.ac.uk:25107, disk): retrieve index
2016/06/19 12:16:02 z8ta6-bi6l4-lhps1yuzszk0315 (humgen-04-02.internal.sanger.ac.uk:25107, disk): retrieve index
2016/06/19 12:16:02 z8ta6-bi6l4-kijrzcy3zkflg3s (humgen-01-02.internal.sanger.ac.uk:25107, disk): retrieve index
2016/06/19 12:16:02 z8ta6-bi6l4-nynctbmdi8nj6v0 (humgen-01-03.internal.sanger.ac.uk:25107, disk): retrieve index
2016/06/19 12:16:02 z8ta6-bi6l4-az89xled1ycwnpb (humgen-04-03.internal.sanger.ac.uk:25107, disk): retrieve index
2016/06/19 12:16:02 z8ta6-bi6l4-3kqkr5lgow2uogm (humgen-03-01.internal.sanger.ac.uk:25107, disk): retrieve index
2016/06/19 12:16:05 collections: 0/3273174
2016/06/19 12:16:17 z8ta6-bi6l4-4b0e02ad7mk84ye (humgen-01-01.internal.sanger.ac.uk:25107, disk): add 1337686 replicas to map
2016/06/19 12:16:18 z8ta6-bi6l4-kijrzcy3zkflg3s (humgen-01-02.internal.sanger.ac.uk:25107, disk): add 1346446 replicas to map
2016/06/19 12:16:18 z8ta6-bi6l4-stmnte9yvd2gh6o (humgen-04-01.internal.sanger.ac.uk:25107, disk): add 1337822 replicas to map
2016/06/19 12:16:18 z8ta6-bi6l4-sg7xxak114gh1j0 (humgen-02-03.internal.sanger.ac.uk:25107, disk): add 1338775 replicas to map
2016/06/19 12:16:18 z8ta6-bi6l4-a1pntf0wx8vfr5v (humgen-03-03.internal.sanger.ac.uk:25107, disk): add 1339176 replicas to map
2016/06/19 12:16:18 z8ta6-bi6l4-nynctbmdi8nj6v0 (humgen-01-03.internal.sanger.ac.uk:25107, disk): add 1312482 replicas to map
2016/06/19 12:16:18 z8ta6-bi6l4-w3rpndae62qwwre (humgen-02-02.internal.sanger.ac.uk:25107, disk): add 1338270 replicas to map
2016/06/19 12:16:18 z8ta6-bi6l4-lhps1yuzszk0315 (humgen-04-02.internal.sanger.ac.uk:25107, disk): add 1340508 replicas to map
2016/06/19 12:16:18 z8ta6-bi6l4-yxhkoekmnv5czf3 (humgen-02-01.internal.sanger.ac.uk:25107, disk): add 1335581 replicas to map
2016/06/19 12:16:18 z8ta6-bi6l4-3kqkr5lgow2uogm (humgen-03-01.internal.sanger.ac.uk:25107, disk): add 1334745 replicas to map
2016/06/19 12:16:19 z8ta6-bi6l4-4b0e02ad7mk84ye (humgen-01-01.internal.sanger.ac.uk:25107, disk): done
2016/06/19 12:16:21 z8ta6-bi6l4-kijrzcy3zkflg3s (humgen-01-02.internal.sanger.ac.uk:25107, disk): done
2016/06/19 12:16:21 z8ta6-bi6l4-stmnte9yvd2gh6o (humgen-04-01.internal.sanger.ac.uk:25107, disk): done
2016/06/19 12:16:24 z8ta6-bi6l4-az89xled1ycwnpb (humgen-04-03.internal.sanger.ac.uk:25107, disk): add 1373440 replicas to map
2016/06/19 12:16:26 z8ta6-bi6l4-sg7xxak114gh1j0 (humgen-02-03.internal.sanger.ac.uk:25107, disk): done
2016/06/19 12:16:26 z8ta6-bi6l4-a1pntf0wx8vfr5v (humgen-03-03.internal.sanger.ac.uk:25107, disk): done
2016/06/19 12:16:27 z8ta6-bi6l4-nynctbmdi8nj6v0 (humgen-01-03.internal.sanger.ac.uk:25107, disk): done
2016/06/19 12:16:28 z8ta6-bi6l4-w3rpndae62qwwre (humgen-02-02.internal.sanger.ac.uk:25107, disk): done
2016/06/19 12:16:29 z8ta6-bi6l4-lhps1yuzszk0315 (humgen-04-02.internal.sanger.ac.uk:25107, disk): done
2016/06/19 12:16:30 z8ta6-bi6l4-yxhkoekmnv5czf3 (humgen-02-01.internal.sanger.ac.uk:25107, disk): done
2016/06/19 12:16:31 z8ta6-bi6l4-3kqkr5lgow2uogm (humgen-03-01.internal.sanger.ac.uk:25107, disk): done
2016/06/19 12:16:32 z8ta6-bi6l4-az89xled1ycwnpb (humgen-04-03.internal.sanger.ac.uk:25107, disk): done
2016/06/19 12:17:15 collections: 25000/3273174
2016/06/19 12:18:25 z8ta6-bi6l4-ph34sug9wmnom07 (humgen-03-02.internal.sanger.ac.uk:25107, disk): add 1798573 replicas to map
2016/06/19 12:18:28 z8ta6-bi6l4-ph34sug9wmnom07 (humgen-03-02.internal.sanger.ac.uk:25107, disk): done
2016/06/19 12:18:39 collections: 50000/3273174
2016/06/19 12:20:02 collections: 75000/3273174
2016/06/19 12:21:21 collections: 100000/3273174
2016/06/19 12:22:43 collections: 125000/3273174
2016/06/19 12:24:04 collections: 150000/3273174
2016/06/19 12:25:33 collections: 175000/3273174
2016/06/19 12:26:52 collections: 200000/3273174
2016/06/19 12:28:14 collections: 225000/3273174
2016/06/19 12:29:35 collections: 250000/3273174
2016/06/19 12:31:00 collections: 275000/3273174
2016/06/19 12:32:20 collections: 300000/3273174
2016/06/19 12:33:38 collections: 324238/3273174
2016/06/19 12:35:02 collections: 343038/3273174
2016/06/19 12:36:11 collections: 368038/3273174
2016/06/19 12:37:40 collections: 393038/3273174
2016/06/19 12:39:00 collections: 418038/3273174
2016/06/19 12:40:21 collections: 443038/3273174
2016/06/19 12:41:46 collections: 468038/3273174
2016/06/19 12:43:05 collections: 493038/3273174
2016/06/19 12:44:27 collections: 518038/3273174
2016/06/19 12:45:47 collections: 543038/3273174
2016/06/19 12:47:03 collections: 568038/3273174
2016/06/19 12:48:25 collections: 593038/3273174
2016/06/19 12:49:50 collections: 618038/3273174
2016/06/19 12:51:11 collections: 643038/3273174
2016/06/19 12:52:39 collections: 668038/3273174
2016/06/19 12:54:06 collections: 693038/3273174
2016/06/19 12:55:22 collections: 718038/3273174
2016/06/19 12:56:46 collections: 743038/3273174
2016/06/19 12:58:11 collections: 768038/3273174
2016/06/19 12:59:28 collections: 793038/3273174
2016/06/19 13:00:56 collections: 818038/3273174
2016/06/19 13:02:17 collections: 843038/3273174
2016/06/19 13:03:38 collections: 868038/3273174
2016/06/19 13:04:58 collections: 893038/3273174
2016/06/19 13:06:15 collections: 918038/3273174
2016/06/19 13:07:34 collections: 943038/3273174
2016/06/19 13:08:56 collections: 968038/3273174
2016/06/19 13:10:16 collections: 993038/3273174
2016/06/19 13:11:35 collections: 1018038/3273174
2016/06/19 13:12:52 collections: 1043038/3273174
2016/06/19 13:14:15 collections: 1068038/3273174
2016/06/19 13:15:37 collections: 1093038/3273174
2016/06/19 13:16:55 collections: 1118038/3273174
2016/06/19 13:18:15 collections: 1143038/3273174
2016/06/19 13:19:38 collections: 1168038/3273174
2016/06/19 13:20:55 collections: 1193038/3273174
2016/06/19 13:22:13 collections: 1218038/3273174
2016/06/19 13:23:29 collections: 1243038/3273174
2016/06/19 13:24:48 collections: 1268038/3273174
2016/06/19 13:26:07 collections: 1293038/3273174
2016/06/19 13:27:29 collections: 1318038/3273174
2016/06/19 13:28:49 collections: 1343038/3273174
2016/06/19 13:30:10 collections: 1368038/3273174
2016/06/19 13:31:34 collections: 1393038/3273174
2016/06/19 13:32:52 collections: 1418038/3273174
2016/06/19 13:34:11 collections: 1443038/3273174
2016/06/19 13:35:36 collections: 1468038/3273174
2016/06/19 13:36:52 collections: 1493038/3273174
2016/06/19 13:38:08 collections: 1518038/3273174
2016/06/19 13:39:33 collections: 1543038/3273174
2016/06/19 13:40:54 collections: 1568038/3273174
2016/06/19 13:42:16 collections: 1593038/3273174
2016/06/19 13:43:31 collections: 1618038/3273174
2016/06/19 13:44:51 collections: 1643038/3273174
2016/06/19 13:46:13 collections: 1668038/3273174
2016/06/19 13:47:31 collections: 1693038/3273174
2016/06/19 13:48:46 collections: 1718038/3273174
2016/06/19 13:50:04 collections: 1743038/3273174
2016/06/19 13:51:23 collections: 1768038/3273174
2016/06/19 13:52:42 collections: 1793038/3273174
2016/06/19 13:54:02 collections: 1818038/3273174
2016/06/19 13:55:26 collections: 1843038/3273174
2016/06/19 13:56:47 collections: 1868038/3273174
2016/06/19 13:58:04 collections: 1893038/3273174
2016/06/19 13:59:25 collections: 1918038/3273174
2016/06/19 14:00:42 collections: 1943038/3273174
2016/06/19 14:02:00 collections: 1968038/3273174
2016/06/19 14:03:19 collections: 1993038/3273174
2016/06/19 14:04:36 collections: 2018038/3273174
2016/06/19 14:05:55 collections: 2043038/3273174
2016/06/19 14:07:13 collections: 2068038/3273174
2016/06/19 14:08:36 collections: 2093038/3273174
2016/06/19 14:09:54 collections: 2118038/3273174
2016/06/19 14:11:09 collections: 2143038/3273174
2016/06/19 14:12:25 collections: 2168038/3273174
2016/06/19 14:13:44 collections: 2193038/3273174
2016/06/19 14:15:05 collections: 2218038/3273174
2016/06/19 14:16:27 collections: 2243038/3273174
2016/06/19 14:17:47 collections: 2268038/3273174
2016/06/19 14:19:10 collections: 2293038/3273174
2016/06/19 14:20:28 collections: 2318038/3273174
2016/06/19 14:21:44 collections: 2343038/3273174
2016/06/19 14:23:00 collections: 2368038/3273174
2016/06/19 14:24:17 collections: 2393038/3273174
2016/06/19 14:25:37 collections: 2418038/3273174
2016/06/19 14:26:55 collections: 2443038/3273174
2016/06/19 14:28:11 collections: 2468038/3273174
2016/06/19 14:29:26 collections: 2493038/3273174
2016/06/19 14:30:47 collections: 2518038/3273174
2016/06/19 14:32:06 collections: 2543038/3273174
2016/06/19 14:33:21 collections: 2568038/3273174
2016/06/19 14:34:41 collections: 2593038/3273174
2016/06/19 14:35:58 collections: 2618038/3273174
2016/06/19 14:37:20 collections: 2643038/3273174
2016/06/19 14:38:37 collections: 2668038/3273174
2016/06/19 14:39:50 collections: 2693038/3273174
2016/06/19 14:41:11 collections: 2718038/3273174
2016/06/19 14:42:30 collections: 2743038/3273174
2016/06/19 14:43:49 collections: 2768038/3273174
2016/06/19 14:45:06 collections: 2793038/3273174
2016/06/19 14:46:22 collections: 2818038/3273174
2016/06/19 14:47:44 collections: 2843038/3273174
2016/06/19 14:49:05 collections: 2868038/3273174
2016/06/19 14:50:22 collections: 2893038/3273174
2016/06/19 14:51:40 collections: 2918038/3273174
2016/06/19 14:52:57 collections: 2943038/3273174
2016/06/19 14:54:17 collections: 2968038/3273174
2016/06/19 14:55:32 collections: 2993038/3273174
2016/06/19 14:56:53 collections: 3018038/3273174
2016/06/19 14:58:10 collections: 3043038/3273174
2016/06/19 14:59:27 collections: 3068038/3273174
2016/06/19 15:00:47 collections: 3093038/3273174
2016/06/19 15:02:05 collections: 3118038/3273174
2016/06/19 15:03:19 collections: 3143038/3273174
2016/06/19 15:04:36 collections: 3168038/3273174
2016/06/19 15:05:57 collections: 3193038/3273174
2016/06/19 15:07:14 collections: 3218038/3273174
2016/06/19 15:08:34 collections: 3243038/3273174
2016/06/19 15:09:50 collections: 3268038/3273174
2016/06/19 15:10:06 collections: 3273174/3273174
2016/06/19 15:10:06 collections: 3273174/3273174
2016/06/19 15:11:26 GetCurrentState: took 2h55m24.782781041s
2016/06/19 15:11:26 ComputeChangeSets: start
2016/06/19 15:12:42 ComputeChangeSets: took 1m16.270671548s
2016/06/19 15:12:46 ===
2016/06/19 15:12:46 12 replicas (6 blocks, 402731660 bytes) lost (0=have<want)
2016/06/19 15:12:46 1231 replicas (163 blocks, 80390127068 bytes) underreplicated (0<have<want)
2016/06/19 15:12:46 1141920 replicas (570080 blocks, 71769882700386 bytes) just right (have=want)
2016/06/19 15:12:46 4737089 replicas (4728845 blocks, 230577300632292 bytes) overreplicated (have>want>0)
2016/06/19 15:12:46 25722 replicas (8648 blocks, 2071796729 bytes) unreferenced (have>want=0, new)
2016/06/19 15:12:46 1170608 replicas (692946 blocks, 600101925167 bytes) garbage (have>want=0, old)
2016/06/19 15:12:46 ===
2016/06/19 15:12:46 10601328 replicas (5299094 blocks, 531960816569846 bytes) total commitment (excluding unreferenced)
2016/06/19 15:12:46 16533504 replicas (6000682 blocks, 763059498065306 bytes) total usage
2016/06/19 15:12:46 ===
2016/06/19 15:12:46 z8ta6-bi6l4-lhps1yuzszk0315 (humgen-04-02.internal.sanger.ac.uk:25107, disk): ChangeSet{Pulls:1548, Trashes:23248}
2016/06/19 15:12:46 z8ta6-bi6l4-4b0e02ad7mk84ye (humgen-01-01.internal.sanger.ac.uk:25107, disk): ChangeSet{Pulls:1563, Trashes:23118}
2016/06/19 15:12:46 z8ta6-bi6l4-kijrzcy3zkflg3s (humgen-01-02.internal.sanger.ac.uk:25107, disk): ChangeSet{Pulls:36345, Trashes:85366}
2016/06/19 15:12:46 z8ta6-bi6l4-nynctbmdi8nj6v0 (humgen-01-03.internal.sanger.ac.uk:25107, disk): ChangeSet{Pulls:14318, Trashes:20328}
2016/06/19 15:12:46 z8ta6-bi6l4-yxhkoekmnv5czf3 (humgen-02-01.internal.sanger.ac.uk:25107, disk): ChangeSet{Pulls:2940, Trashes:23171}
2016/06/19 15:12:46 z8ta6-bi6l4-3kqkr5lgow2uogm (humgen-03-01.internal.sanger.ac.uk:25107, disk): ChangeSet{Pulls:4553, Trashes:23289}
2016/06/19 15:12:46 z8ta6-bi6l4-ph34sug9wmnom07 (humgen-03-02.internal.sanger.ac.uk:25107, disk): ChangeSet{Pulls:1712, Trashes:484655}
2016/06/19 15:12:46 z8ta6-bi6l4-stmnte9yvd2gh6o (humgen-04-01.internal.sanger.ac.uk:25107, disk): ChangeSet{Pulls:1641, Trashes:23102}
2016/06/19 15:12:46 z8ta6-bi6l4-az89xled1ycwnpb (humgen-04-03.internal.sanger.ac.uk:25107, disk): ChangeSet{Pulls:1476, Trashes:23302}
2016/06/19 15:12:46 z8ta6-bi6l4-w3rpndae62qwwre (humgen-02-02.internal.sanger.ac.uk:25107, disk): ChangeSet{Pulls:1522, Trashes:22855}
2016/06/19 15:12:46 z8ta6-bi6l4-sg7xxak114gh1j0 (humgen-02-03.internal.sanger.ac.uk:25107, disk): ChangeSet{Pulls:1511, Trashes:23367}
2016/06/19 15:12:46 z8ta6-bi6l4-a1pntf0wx8vfr5v (humgen-03-03.internal.sanger.ac.uk:25107, disk): ChangeSet{Pulls:1495, Trashes:23349}
2016/06/19 15:12:46 ===
2016/06/19 15:12:46 Replication level distribution (counting N replicas on a single server as N):
2016/06/19 15:12:46  0:       6 #######
2016/06/19 15:12:46  1:  390568 ##################################################
2016/06/19 15:12:46  2:  697270 ####################################################
2016/06/19 15:12:46  3: 4905403 ###########################################################
2016/06/19 15:12:46  4:    6282 ##################################
2016/06/19 15:12:46  5:     952 ##########################
2016/06/19 15:12:46  6:      30 #############
2016/06/19 15:12:46  7:       1 ##
2016/06/19 15:12:46  8:       0
2016/06/19 15:12:46  9:       0
2016/06/19 15:12:46 10:       0
2016/06/19 15:12:46 11:       0
2016/06/19 15:12:46 12:     176 ####################
2016/06/19 15:12:46 ===
2016/06/19 15:12:47 z8ta6-bi6l4-a1pntf0wx8vfr5v (humgen-03-03.internal.sanger.ac.uk:25107, disk): send pull list: start
2016/06/19 15:12:47 z8ta6-bi6l4-kijrzcy3zkflg3s (humgen-01-02.internal.sanger.ac.uk:25107, disk): send pull list: start
2016/06/19 15:12:47 z8ta6-bi6l4-4b0e02ad7mk84ye (humgen-01-01.internal.sanger.ac.uk:25107, disk): send pull list: start
2016/06/19 15:12:47 z8ta6-bi6l4-stmnte9yvd2gh6o (humgen-04-01.internal.sanger.ac.uk:25107, disk): send pull list: start
2016/06/19 15:12:47 z8ta6-bi6l4-lhps1yuzszk0315 (humgen-04-02.internal.sanger.ac.uk:25107, disk): send pull list: start
2016/06/19 15:12:47 z8ta6-bi6l4-3kqkr5lgow2uogm (humgen-03-01.internal.sanger.ac.uk:25107, disk): send pull list: start
2016/06/19 15:12:47 z8ta6-bi6l4-nynctbmdi8nj6v0 (humgen-01-03.internal.sanger.ac.uk:25107, disk): send pull list: start
2016/06/19 15:12:47 z8ta6-bi6l4-yxhkoekmnv5czf3 (humgen-02-01.internal.sanger.ac.uk:25107, disk): send pull list: start
2016/06/19 15:12:47 z8ta6-bi6l4-sg7xxak114gh1j0 (humgen-02-03.internal.sanger.ac.uk:25107, disk): send pull list: start
2016/06/19 15:12:47 z8ta6-bi6l4-az89xled1ycwnpb (humgen-04-03.internal.sanger.ac.uk:25107, disk): send pull list: start
2016/06/19 15:12:47 z8ta6-bi6l4-w3rpndae62qwwre (humgen-02-02.internal.sanger.ac.uk:25107, disk): send pull list: start
2016/06/19 15:12:47 z8ta6-bi6l4-ph34sug9wmnom07 (humgen-03-02.internal.sanger.ac.uk:25107, disk): send pull list: start
2016/06/19 15:12:47 z8ta6-bi6l4-az89xled1ycwnpb (humgen-04-03.internal.sanger.ac.uk:25107, disk): send pull list: took 18.294418ms
2016/06/19 15:12:47 z8ta6-bi6l4-w3rpndae62qwwre (humgen-02-02.internal.sanger.ac.uk:25107, disk): send pull list: took 19.28437ms
2016/06/19 15:12:47 z8ta6-bi6l4-lhps1yuzszk0315 (humgen-04-02.internal.sanger.ac.uk:25107, disk): send pull list: took 20.657787ms
2016/06/19 15:12:47 z8ta6-bi6l4-stmnte9yvd2gh6o (humgen-04-01.internal.sanger.ac.uk:25107, disk): send pull list: took 20.805979ms
2016/06/19 15:12:47 z8ta6-bi6l4-4b0e02ad7mk84ye (humgen-01-01.internal.sanger.ac.uk:25107, disk): send pull list: took 25.161251ms
2016/06/19 15:12:47 z8ta6-bi6l4-ph34sug9wmnom07 (humgen-03-02.internal.sanger.ac.uk:25107, disk): send pull list: took 25.9737ms
2016/06/19 15:12:47 z8ta6-bi6l4-sg7xxak114gh1j0 (humgen-02-03.internal.sanger.ac.uk:25107, disk): send pull list: took 27.55602ms
2016/06/19 15:12:47 z8ta6-bi6l4-sg7xxak114gh1j0 (humgen-02-03.internal.sanger.ac.uk:25107, disk): send pull list: Put http://humgen-02-03.internal.sanger.ac.uk:25107/pull: EOF
2016/06/19 15:12:47 z8ta6-bi6l4-a1pntf0wx8vfr5v (humgen-03-03.internal.sanger.ac.uk:25107, disk): send pull list: took 28.358467ms
2016/06/19 15:12:47 z8ta6-bi6l4-a1pntf0wx8vfr5v (humgen-03-03.internal.sanger.ac.uk:25107, disk): send pull list: Put http://humgen-03-03.internal.sanger.ac.uk:25107/pull: EOF
2016/06/19 15:12:47 z8ta6-bi6l4-yxhkoekmnv5czf3 (humgen-02-01.internal.sanger.ac.uk:25107, disk): send pull list: took 34.643279ms
2016/06/19 15:12:47 z8ta6-bi6l4-3kqkr5lgow2uogm (humgen-03-01.internal.sanger.ac.uk:25107, disk): send pull list: took 49.676402ms
2016/06/19 15:12:47 z8ta6-bi6l4-nynctbmdi8nj6v0 (humgen-01-03.internal.sanger.ac.uk:25107, disk): send pull list: took 150.482078ms
2016/06/19 15:12:47 z8ta6-bi6l4-kijrzcy3zkflg3s (humgen-01-02.internal.sanger.ac.uk:25107, disk): send pull list: took 359.143541ms
2016/06/19 15:12:47 Run: took 2h56m46.019896237s
2016/06/19 15:12:47 run failed: z8ta6-bi6l4-a1pntf0wx8vfr5v (humgen-03-03.internal.sanger.ac.uk:25107, disk): send pull list: Put http://humgen-03-03.internal.sanger.ac.uk:25107/pull: EOF
2016/06/19 15:12:47 timer went off
2016/06/19 15:12:47 starting next run
2016/06/19 15:12:47 Run: start
2016/06/19 15:12:47 z8ta6-bi6l4-az89xled1ycwnpb (humgen-04-03.internal.sanger.ac.uk:25107, disk): send trash list: start
2016/06/19 15:12:47 z8ta6-bi6l4-lhps1yuzszk0315 (humgen-04-02.internal.sanger.ac.uk:25107, disk): send trash list: start
2016/06/19 15:12:47 z8ta6-bi6l4-3kqkr5lgow2uogm (humgen-03-01.internal.sanger.ac.uk:25107, disk): send trash list: start
2016/06/19 15:12:47 z8ta6-bi6l4-kijrzcy3zkflg3s (humgen-01-02.internal.sanger.ac.uk:25107, disk): send trash list: start
2016/06/19 15:12:47 z8ta6-bi6l4-sg7xxak114gh1j0 (humgen-02-03.internal.sanger.ac.uk:25107, disk): send trash list: start
2016/06/19 15:12:47 z8ta6-bi6l4-nynctbmdi8nj6v0 (humgen-01-03.internal.sanger.ac.uk:25107, disk): send trash list: start
2016/06/19 15:12:47 z8ta6-bi6l4-w3rpndae62qwwre (humgen-02-02.internal.sanger.ac.uk:25107, disk): send trash list: start
2016/06/19 15:12:47 z8ta6-bi6l4-4b0e02ad7mk84ye (humgen-01-01.internal.sanger.ac.uk:25107, disk): send trash list: start
2016/06/19 15:12:47 z8ta6-bi6l4-stmnte9yvd2gh6o (humgen-04-01.internal.sanger.ac.uk:25107, disk): send trash list: start
2016/06/19 15:12:47 z8ta6-bi6l4-a1pntf0wx8vfr5v (humgen-03-03.internal.sanger.ac.uk:25107, disk): send trash list: start
2016/06/19 15:12:47 z8ta6-bi6l4-yxhkoekmnv5czf3 (humgen-02-01.internal.sanger.ac.uk:25107, disk): send trash list: start
2016/06/19 15:12:47 z8ta6-bi6l4-ph34sug9wmnom07 (humgen-03-02.internal.sanger.ac.uk:25107, disk): send trash list: start
2016/06/19 15:12:47 z8ta6-bi6l4-az89xled1ycwnpb (humgen-04-03.internal.sanger.ac.uk:25107, disk): send trash list: took 1.192673ms
2016/06/19 15:12:47 z8ta6-bi6l4-w3rpndae62qwwre (humgen-02-02.internal.sanger.ac.uk:25107, disk): send trash list: took 1.181356ms
2016/06/19 15:12:47 z8ta6-bi6l4-stmnte9yvd2gh6o (humgen-04-01.internal.sanger.ac.uk:25107, disk): send trash list: took 1.195518ms
2016/06/19 15:12:47 z8ta6-bi6l4-sg7xxak114gh1j0 (humgen-02-03.internal.sanger.ac.uk:25107, disk): send trash list: took 1.390754ms
2016/06/19 15:12:47 z8ta6-bi6l4-ph34sug9wmnom07 (humgen-03-02.internal.sanger.ac.uk:25107, disk): send trash list: took 1.345774ms
2016/06/19 15:12:47 z8ta6-bi6l4-a1pntf0wx8vfr5v (humgen-03-03.internal.sanger.ac.uk:25107, disk): send trash list: took 1.398906ms
2016/06/19 15:12:47 z8ta6-bi6l4-4b0e02ad7mk84ye (humgen-01-01.internal.sanger.ac.uk:25107, disk): send trash list: took 1.431792ms
2016/06/19 15:12:47 z8ta6-bi6l4-3kqkr5lgow2uogm (humgen-03-01.internal.sanger.ac.uk:25107, disk): send trash list: took 1.546558ms
2016/06/19 15:12:47 z8ta6-bi6l4-nynctbmdi8nj6v0 (humgen-01-03.internal.sanger.ac.uk:25107, disk): send trash list: took 1.516724ms
2016/06/19 15:12:47 z8ta6-bi6l4-yxhkoekmnv5czf3 (humgen-02-01.internal.sanger.ac.uk:25107, disk): send trash list: took 1.555333ms
2016/06/19 15:12:47 z8ta6-bi6l4-kijrzcy3zkflg3s (humgen-01-02.internal.sanger.ac.uk:25107, disk): send trash list: took 1.729169ms
2016/06/19 15:12:47 z8ta6-bi6l4-lhps1yuzszk0315 (humgen-04-02.internal.sanger.ac.uk:25107, disk): send trash list: took 2.074129ms
2016/06/19 15:12:47 GetCurrentState: start
2016/06/19 15:12:48 z8ta6-bi6l4-az89xled1ycwnpb (humgen-04-03.internal.sanger.ac.uk:25107, disk): retrieve index
2016/06/19 15:12:48 z8ta6-bi6l4-kijrzcy3zkflg3s (humgen-01-02.internal.sanger.ac.uk:25107, disk): retrieve index
2016/06/19 15:12:48 z8ta6-bi6l4-w3rpndae62qwwre (humgen-02-02.internal.sanger.ac.uk:25107, disk): retrieve index
2016/06/19 15:12:48 z8ta6-bi6l4-nynctbmdi8nj6v0 (humgen-01-03.internal.sanger.ac.uk:25107, disk): retrieve index
2016/06/19 15:12:48 z8ta6-bi6l4-4b0e02ad7mk84ye (humgen-01-01.internal.sanger.ac.uk:25107, disk): retrieve index
2016/06/19 15:12:48 z8ta6-bi6l4-ph34sug9wmnom07 (humgen-03-02.internal.sanger.ac.uk:25107, disk): retrieve index
2016/06/19 15:12:48 z8ta6-bi6l4-sg7xxak114gh1j0 (humgen-02-03.internal.sanger.ac.uk:25107, disk): retrieve index
2016/06/19 15:12:48 z8ta6-bi6l4-3kqkr5lgow2uogm (humgen-03-01.internal.sanger.ac.uk:25107, disk): retrieve index
2016/06/19 15:12:48 z8ta6-bi6l4-yxhkoekmnv5czf3 (humgen-02-01.internal.sanger.ac.uk:25107, disk): retrieve index
2016/06/19 15:12:48 z8ta6-bi6l4-stmnte9yvd2gh6o (humgen-04-01.internal.sanger.ac.uk:25107, disk): retrieve index
2016/06/19 15:12:48 z8ta6-bi6l4-a1pntf0wx8vfr5v (humgen-03-03.internal.sanger.ac.uk:25107, disk): retrieve index
2016/06/19 15:12:48 z8ta6-bi6l4-lhps1yuzszk0315 (humgen-04-02.internal.sanger.ac.uk:25107, disk): retrieve index
2016/06/19 15:12:51 collections: 0/3273174
2016/06/19 15:13:02 z8ta6-bi6l4-3kqkr5lgow2uogm (humgen-03-01.internal.sanger.ac.uk:25107, disk): add 1334745 replicas to map
2016/06/19 15:13:02 z8ta6-bi6l4-nynctbmdi8nj6v0 (humgen-01-03.internal.sanger.ac.uk:25107, disk): add 1312482 replicas to map
2016/06/19 15:13:02 z8ta6-bi6l4-4b0e02ad7mk84ye (humgen-01-01.internal.sanger.ac.uk:25107, disk): add 1337686 replicas to map
2016/06/19 15:13:02 z8ta6-bi6l4-w3rpndae62qwwre (humgen-02-02.internal.sanger.ac.uk:25107, disk): add 1338270 replicas to map
2016/06/19 15:13:02 z8ta6-bi6l4-stmnte9yvd2gh6o (humgen-04-01.internal.sanger.ac.uk:25107, disk): add 1337822 replicas to map
2016/06/19 15:13:02 z8ta6-bi6l4-sg7xxak114gh1j0 (humgen-02-03.internal.sanger.ac.uk:25107, disk): add 1338775 replicas to map
2016/06/19 15:13:02 z8ta6-bi6l4-a1pntf0wx8vfr5v (humgen-03-03.internal.sanger.ac.uk:25107, disk): add 1339176 replicas to map
2016/06/19 15:13:03 z8ta6-bi6l4-yxhkoekmnv5czf3 (humgen-02-01.internal.sanger.ac.uk:25107, disk): add 1335581 replicas to map
2016/06/19 15:13:03 z8ta6-bi6l4-az89xled1ycwnpb (humgen-04-03.internal.sanger.ac.uk:25107, disk): add 1373440 replicas to map
2016/06/19 15:13:03 z8ta6-bi6l4-3kqkr5lgow2uogm (humgen-03-01.internal.sanger.ac.uk:25107, disk): done
2016/06/19 15:13:03 z8ta6-bi6l4-kijrzcy3zkflg3s (humgen-01-02.internal.sanger.ac.uk:25107, disk): add 1346446 replicas to map
2016/06/19 15:13:03 z8ta6-bi6l4-lhps1yuzszk0315 (humgen-04-02.internal.sanger.ac.uk:25107, disk): add 1340508 replicas to map
2016/06/19 15:13:04 z8ta6-bi6l4-nynctbmdi8nj6v0 (humgen-01-03.internal.sanger.ac.uk:25107, disk): done
2016/06/19 15:13:05 z8ta6-bi6l4-4b0e02ad7mk84ye (humgen-01-01.internal.sanger.ac.uk:25107, disk): done
2016/06/19 15:13:08 z8ta6-bi6l4-w3rpndae62qwwre (humgen-02-02.internal.sanger.ac.uk:25107, disk): done
2016/06/19 15:13:09 z8ta6-bi6l4-stmnte9yvd2gh6o (humgen-04-01.internal.sanger.ac.uk:25107, disk): done
2016/06/19 15:13:10 z8ta6-bi6l4-sg7xxak114gh1j0 (humgen-02-03.internal.sanger.ac.uk:25107, disk): done
2016/06/19 15:13:10 z8ta6-bi6l4-a1pntf0wx8vfr5v (humgen-03-03.internal.sanger.ac.uk:25107, disk): done
2016/06/19 15:13:11 z8ta6-bi6l4-yxhkoekmnv5czf3 (humgen-02-01.internal.sanger.ac.uk:25107, disk): done
2016/06/19 15:13:12 z8ta6-bi6l4-az89xled1ycwnpb (humgen-04-03.internal.sanger.ac.uk:25107, disk): done
2016/06/19 15:13:13 z8ta6-bi6l4-kijrzcy3zkflg3s (humgen-01-02.internal.sanger.ac.uk:25107, disk): done
2016/06/19 15:13:13 z8ta6-bi6l4-lhps1yuzszk0315 (humgen-04-02.internal.sanger.ac.uk:25107, disk): done
2016/06/19 15:14:07 collections: 25000/3273174
2016/06/19 15:14:18 z8ta6-bi6l4-ph34sug9wmnom07 (humgen-03-02.internal.sanger.ac.uk:25107, disk): add 1798573 replicas to map
2016/06/19 15:14:20 z8ta6-bi6l4-ph34sug9wmnom07 (humgen-03-02.internal.sanger.ac.uk:25107, disk): done
2016/06/19 15:15:33 collections: 50000/3273174
2016/06/19 15:16:58 collections: 75000/3273174
2016/06/19 15:18:25 collections: 100000/3273174
2016/06/19 15:19:45 collections: 125000/3273174
2016/06/19 15:21:11 collections: 150000/3273174
2016/06/19 15:22:39 collections: 175000/3273174
2016/06/19 15:23:59 collections: 200000/3273174
2016/06/19 15:25:22 collections: 225000/3273174
2016/06/19 15:26:48 collections: 250000/3273174
2016/06/19 15:28:07 collections: 275000/3273174
2016/06/19 15:29:27 collections: 300000/3273174
2016/06/19 15:30:52 collections: 324238/3273174
2016/06/19 15:32:16 collections: 343038/3273174
2016/06/19 15:33:22 collections: 368038/3273174
2016/06/19 15:34:43 collections: 393038/3273174
2016/06/19 15:36:06 collections: 418038/3273174
2016/06/19 15:37:33 collections: 443038/3273174
2016/06/19 15:38:52 collections: 468038/3273174
2016/06/19 15:40:15 collections: 493038/3273174
2016/06/19 15:41:39 collections: 518038/3273174
2016/06/19 15:43:02 collections: 543038/3273174
2016/06/19 15:44:27 collections: 568038/3273174
2016/06/19 15:45:47 collections: 593038/3273174
2016/06/19 15:47:13 collections: 618038/3273174
2016/06/19 15:48:32 collections: 643038/3273174
2016/06/19 15:49:57 collections: 668038/3273174
2016/06/19 15:51:22 collections: 693038/3273174
2016/06/19 15:52:44 collections: 718038/3273174
2016/06/19 15:54:03 collections: 743038/3273174
2016/06/19 15:55:21 collections: 768038/3273174
2016/06/19 15:56:44 collections: 793038/3273174
2016/06/19 15:58:12 collections: 818038/3273174
2016/06/19 15:59:33 collections: 843038/3273174
2016/06/19 16:00:55 collections: 868038/3273174
2016/06/19 16:02:15 collections: 893038/3273174
2016/06/19 16:03:33 collections: 918038/3273174
2016/06/19 16:05:00 collections: 943038/3273174
2016/06/19 16:06:20 collections: 968038/3273174
2016/06/19 16:07:43 collections: 993038/3273174
2016/06/19 16:09:08 collections: 1018038/3273174
2016/06/19 16:10:38 collections: 1043038/3273174
2016/06/19 16:11:56 collections: 1068038/3273174
2016/06/19 16:13:18 collections: 1093038/3273174
2016/06/19 16:14:41 collections: 1118038/3273174
2016/06/19 16:16:05 collections: 1143038/3273174
2016/06/19 16:17:25 collections: 1168038/3273174
2016/06/19 16:18:46 collections: 1193038/3273174
2016/06/19 16:20:05 collections: 1218038/3273174
2016/06/19 16:21:32 collections: 1243038/3273174
2016/06/19 16:22:55 collections: 1268038/3273174
2016/06/19 16:24:15 collections: 1293038/3273174
2016/06/19 16:25:42 collections: 1318038/3273174
2016/06/19 16:27:03 collections: 1343038/3273174
2016/06/19 16:28:23 collections: 1368038/3273174
2016/06/19 16:29:41 collections: 1393038/3273174
2016/06/19 16:31:00 collections: 1418038/3273174
2016/06/19 16:32:22 collections: 1443038/3273174
2016/06/19 16:33:40 collections: 1468038/3273174
2016/06/19 16:34:59 collections: 1493038/3273174
2016/06/19 16:36:24 collections: 1518038/3273174
2016/06/19 16:37:44 collections: 1543038/3273174
2016/06/19 16:39:05 collections: 1568038/3273174
2016/06/19 16:40:24 collections: 1593038/3273174
2016/06/19 16:41:48 collections: 1618038/3273174
2016/06/19 16:43:08 collections: 1643038/3273174
2016/06/19 16:44:33 collections: 1668038/3273174
2016/06/19 16:45:52 collections: 1693038/3273174
2016/06/19 16:47:15 collections: 1718038/3273174
2016/06/19 16:48:35 collections: 1743038/3273174
2016/06/19 16:49:58 collections: 1768038/3273174
2016/06/19 16:51:20 collections: 1793038/3273174
2016/06/19 16:52:41 collections: 1818038/3273174
2016/06/19 16:54:02 collections: 1843038/3273174
2016/06/19 16:55:22 collections: 1868038/3273174
2016/06/19 16:56:45 collections: 1893038/3273174
2016/06/19 16:58:02 collections: 1918038/3273174
2016/06/19 16:59:23 collections: 1943038/3273174
2016/06/19 17:00:43 collections: 1968038/3273174
2016/06/19 17:02:08 collections: 1993038/3273174
2016/06/19 17:03:26 collections: 2018038/3273174
2016/06/19 17:04:40 collections: 2043038/3273174
2016/06/19 17:06:01 collections: 2068038/3273174
2016/06/19 17:07:20 collections: 2093038/3273174
2016/06/19 17:08:39 collections: 2118038/3273174
2016/06/19 17:09:56 collections: 2143038/3273174
2016/06/19 17:11:22 collections: 2168038/3273174
2016/06/19 17:12:41 collections: 2193038/3273174
2016/06/19 17:13:58 collections: 2218038/3273174
2016/06/19 17:15:21 collections: 2243038/3273174
2016/06/19 17:16:39 collections: 2268038/3273174
2016/06/19 17:17:58 collections: 2293038/3273174
2016/06/19 17:19:23 collections: 2318038/3273174
2016/06/19 17:20:39 collections: 2343038/3273174
2016/06/19 17:22:01 collections: 2368038/3273174
2016/06/19 17:23:22 collections: 2393038/3273174
2016/06/19 17:24:49 collections: 2418038/3273174
2016/06/19 17:26:13 collections: 2443038/3273174
2016/06/19 17:27:34 collections: 2468038/3273174
2016/06/19 17:28:54 collections: 2493038/3273174
2016/06/19 17:30:09 collections: 2518038/3273174
2016/06/19 17:31:27 collections: 2543038/3273174
2016/06/19 17:32:45 collections: 2568038/3273174
2016/06/19 17:34:02 collections: 2593038/3273174
2016/06/19 17:35:24 collections: 2618038/3273174
2016/06/19 17:36:47 collections: 2643038/3273174
2016/06/19 17:38:07 collections: 2668038/3273174
2016/06/19 17:39:23 collections: 2693038/3273174
2016/06/19 17:40:39 collections: 2718038/3273174
2016/06/19 17:41:58 collections: 2743038/3273174
2016/06/19 17:43:16 collections: 2768038/3273174
2016/06/19 17:44:35 collections: 2793038/3273174
2016/06/19 17:45:57 collections: 2818038/3273174
2016/06/19 17:47:18 collections: 2843038/3273174
2016/06/19 17:48:37 collections: 2868038/3273174
2016/06/19 17:49:56 collections: 2893038/3273174
2016/06/19 17:51:17 collections: 2918038/3273174
2016/06/19 17:52:38 collections: 2943038/3273174
2016/06/19 17:54:01 collections: 2968038/3273174
2016/06/19 17:55:21 collections: 2993038/3273174
2016/06/19 17:56:39 collections: 3018038/3273174
2016/06/19 17:57:59 collections: 3043038/3273174
2016/06/19 17:59:18 collections: 3068038/3273174
2016/06/19 18:00:35 collections: 3093038/3273174
2016/06/19 18:01:49 collections: 3118038/3273174
2016/06/19 18:03:04 collections: 3143038/3273174
2016/06/19 18:04:24 collections: 3168038/3273174
2016/06/19 18:05:43 collections: 3193038/3273174
2016/06/19 18:07:05 collections: 3218038/3273174
2016/06/19 18:08:22 collections: 3243038/3273174
2016/06/19 18:09:38 collections: 3268038/3273174
2016/06/19 18:09:52 collections: 3273174/3273174
2016/06/19 18:09:52 collections: 3273174/3273174
2016/06/19 18:11:15 GetCurrentState: took 2h58m27.595042209s
2016/06/19 18:11:15 ComputeChangeSets: start
2016/06/19 18:12:34 ComputeChangeSets: took 1m19.342601722s
2016/06/19 18:12:38 ===
2016/06/19 18:12:38 12 replicas (6 blocks, 402731660 bytes) lost (0=have<want)
2016/06/19 18:12:38 1231 replicas (163 blocks, 80390127068 bytes) underreplicated (0<have<want)
2016/06/19 18:12:38 1141920 replicas (570080 blocks, 71769882700386 bytes) just right (have=want)
2016/06/19 18:12:38 4737089 replicas (4728845 blocks, 230577300632292 bytes) overreplicated (have>want>0)
2016/06/19 18:12:38 25565 replicas (8595 blocks, 2071752841 bytes) unreferenced (have>want=0, new)
2016/06/19 18:12:38 1170765 replicas (692999 blocks, 600101969055 bytes) garbage (have>want=0, old)
2016/06/19 18:12:38 ===
2016/06/19 18:12:38 10601328 replicas (5299094 blocks, 531960816569846 bytes) total commitment (excluding unreferenced)
2016/06/19 18:12:38 16533504 replicas (6000682 blocks, 763059498065306 bytes) total usage
2016/06/19 18:12:38 ===
2016/06/19 18:12:38 z8ta6-bi6l4-3kqkr5lgow2uogm (humgen-03-01.internal.sanger.ac.uk:25107, disk): ChangeSet{Pulls:4553, Trashes:23291}
2016/06/19 18:12:38 z8ta6-bi6l4-ph34sug9wmnom07 (humgen-03-02.internal.sanger.ac.uk:25107, disk): ChangeSet{Pulls:1712, Trashes:484663}
2016/06/19 18:12:38 z8ta6-bi6l4-stmnte9yvd2gh6o (humgen-04-01.internal.sanger.ac.uk:25107, disk): ChangeSet{Pulls:1641, Trashes:23110}
2016/06/19 18:12:38 z8ta6-bi6l4-az89xled1ycwnpb (humgen-04-03.internal.sanger.ac.uk:25107, disk): ChangeSet{Pulls:1476, Trashes:23309}
2016/06/19 18:12:38 z8ta6-bi6l4-kijrzcy3zkflg3s (humgen-01-02.internal.sanger.ac.uk:25107, disk): ChangeSet{Pulls:36345, Trashes:85368}
2016/06/19 18:12:38 z8ta6-bi6l4-nynctbmdi8nj6v0 (humgen-01-03.internal.sanger.ac.uk:25107, disk): ChangeSet{Pulls:14318, Trashes:20329}
2016/06/19 18:12:38 z8ta6-bi6l4-w3rpndae62qwwre (humgen-02-02.internal.sanger.ac.uk:25107, disk): ChangeSet{Pulls:1522, Trashes:22858}
2016/06/19 18:12:38 z8ta6-bi6l4-sg7xxak114gh1j0 (humgen-02-03.internal.sanger.ac.uk:25107, disk): ChangeSet{Pulls:1511, Trashes:23370}
2016/06/19 18:12:38 z8ta6-bi6l4-a1pntf0wx8vfr5v (humgen-03-03.internal.sanger.ac.uk:25107, disk): ChangeSet{Pulls:1495, Trashes:23353}
2016/06/19 18:12:38 z8ta6-bi6l4-lhps1yuzszk0315 (humgen-04-02.internal.sanger.ac.uk:25107, disk): ChangeSet{Pulls:1548, Trashes:23254}
2016/06/19 18:12:38 z8ta6-bi6l4-4b0e02ad7mk84ye (humgen-01-01.internal.sanger.ac.uk:25107, disk): ChangeSet{Pulls:1563, Trashes:23120}
2016/06/19 18:12:38 z8ta6-bi6l4-yxhkoekmnv5czf3 (humgen-02-01.internal.sanger.ac.uk:25107, disk): ChangeSet{Pulls:2940, Trashes:23178}
2016/06/19 18:12:38 ===
2016/06/19 18:12:38 Replication level distribution (counting N replicas on a single server as N):
2016/06/19 18:12:38  0:       6 #######
2016/06/19 18:12:38  1:  390568 ##################################################
2016/06/19 18:12:38  2:  697270 ####################################################
2016/06/19 18:12:38  3: 4905403 ###########################################################
2016/06/19 18:12:38  4:    6282 ##################################
2016/06/19 18:12:38  5:     952 ##########################
2016/06/19 18:12:38  6:      30 #############
2016/06/19 18:12:38  7:       1 ##
2016/06/19 18:12:38  8:       0
2016/06/19 18:12:38  9:       0
2016/06/19 18:12:38 10:       0
2016/06/19 18:12:38 11:       0
2016/06/19 18:12:38 12:     176 ####################
2016/06/19 18:12:38 ===
2016/06/19 18:12:38 z8ta6-bi6l4-stmnte9yvd2gh6o (humgen-04-01.internal.sanger.ac.uk:25107, disk): send pull list: start
2016/06/19 18:12:38 z8ta6-bi6l4-lhps1yuzszk0315 (humgen-04-02.internal.sanger.ac.uk:25107, disk): send pull list: start
2016/06/19 18:12:38 z8ta6-bi6l4-yxhkoekmnv5czf3 (humgen-02-01.internal.sanger.ac.uk:25107, disk): send pull list: start
2016/06/19 18:12:38 z8ta6-bi6l4-nynctbmdi8nj6v0 (humgen-01-03.internal.sanger.ac.uk:25107, disk): send pull list: start
2016/06/19 18:12:38 z8ta6-bi6l4-4b0e02ad7mk84ye (humgen-01-01.internal.sanger.ac.uk:25107, disk): send pull list: start
2016/06/19 18:12:38 z8ta6-bi6l4-sg7xxak114gh1j0 (humgen-02-03.internal.sanger.ac.uk:25107, disk): send pull list: start
2016/06/19 18:12:38 z8ta6-bi6l4-kijrzcy3zkflg3s (humgen-01-02.internal.sanger.ac.uk:25107, disk): send pull list: start
2016/06/19 18:12:38 z8ta6-bi6l4-az89xled1ycwnpb (humgen-04-03.internal.sanger.ac.uk:25107, disk): send pull list: start
2016/06/19 18:12:38 z8ta6-bi6l4-ph34sug9wmnom07 (humgen-03-02.internal.sanger.ac.uk:25107, disk): send pull list: start
2016/06/19 18:12:38 z8ta6-bi6l4-a1pntf0wx8vfr5v (humgen-03-03.internal.sanger.ac.uk:25107, disk): send pull list: start
2016/06/19 18:12:38 z8ta6-bi6l4-3kqkr5lgow2uogm (humgen-03-01.internal.sanger.ac.uk:25107, disk): send pull list: start
2016/06/19 18:12:38 z8ta6-bi6l4-w3rpndae62qwwre (humgen-02-02.internal.sanger.ac.uk:25107, disk): send pull list: start
2016/06/19 18:12:38 z8ta6-bi6l4-a1pntf0wx8vfr5v (humgen-03-03.internal.sanger.ac.uk:25107, disk): send pull list: took 18.714115ms
2016/06/19 18:12:38 z8ta6-bi6l4-az89xled1ycwnpb (humgen-04-03.internal.sanger.ac.uk:25107, disk): send pull list: took 18.576846ms
2016/06/19 18:12:38 z8ta6-bi6l4-w3rpndae62qwwre (humgen-02-02.internal.sanger.ac.uk:25107, disk): send pull list: took 19.002479ms
2016/06/19 18:12:38 z8ta6-bi6l4-lhps1yuzszk0315 (humgen-04-02.internal.sanger.ac.uk:25107, disk): send pull list: took 19.383481ms
2016/06/19 18:12:38 z8ta6-bi6l4-sg7xxak114gh1j0 (humgen-02-03.internal.sanger.ac.uk:25107, disk): send pull list: took 22.965699ms
2016/06/19 18:12:38 z8ta6-bi6l4-4b0e02ad7mk84ye (humgen-01-01.internal.sanger.ac.uk:25107, disk): send pull list: took 24.329261ms
2016/06/19 18:12:38 z8ta6-bi6l4-4b0e02ad7mk84ye (humgen-01-01.internal.sanger.ac.uk:25107, disk): send pull list: Put http://humgen-01-01.internal.sanger.ac.uk:25107/pull: EOF
2016/06/19 18:12:38 z8ta6-bi6l4-stmnte9yvd2gh6o (humgen-04-01.internal.sanger.ac.uk:25107, disk): send pull list: took 25.162295ms
2016/06/19 18:12:38 z8ta6-bi6l4-ph34sug9wmnom07 (humgen-03-02.internal.sanger.ac.uk:25107, disk): send pull list: took 25.905864ms
2016/06/19 18:12:38 z8ta6-bi6l4-yxhkoekmnv5czf3 (humgen-02-01.internal.sanger.ac.uk:25107, disk): send pull list: took 34.18427ms
2016/06/19 18:12:38 z8ta6-bi6l4-3kqkr5lgow2uogm (humgen-03-01.internal.sanger.ac.uk:25107, disk): send pull list: took 55.276046ms
2016/06/19 18:12:38 z8ta6-bi6l4-nynctbmdi8nj6v0 (humgen-01-03.internal.sanger.ac.uk:25107, disk): send pull list: took 182.975728ms
2016/06/19 18:12:39 z8ta6-bi6l4-kijrzcy3zkflg3s (humgen-01-02.internal.sanger.ac.uk:25107, disk): send pull list: took 391.756587ms
2016/06/19 18:12:39 Run: took 2h59m51.275338725s
2016/06/19 18:12:39 run failed: z8ta6-bi6l4-4b0e02ad7mk84ye (humgen-01-01.internal.sanger.ac.uk:25107, disk): send pull list: Put http://humgen-01-01.internal.sanger.ac.uk:25107/pull: EOF
2016/06/19 18:12:39 timer went off
2016/06/19 18:12:39 starting next run
2016/06/19 18:12:39 Run: start
2016/06/19 18:12:39 z8ta6-bi6l4-stmnte9yvd2gh6o (humgen-04-01.internal.sanger.ac.uk:25107, disk): send trash list: start
2016/06/19 18:12:39 z8ta6-bi6l4-kijrzcy3zkflg3s (humgen-01-02.internal.sanger.ac.uk:25107, disk): send trash list: start
2016/06/19 18:12:39 z8ta6-bi6l4-w3rpndae62qwwre (humgen-02-02.internal.sanger.ac.uk:25107, disk): send trash list: start
2016/06/19 18:12:39 z8ta6-bi6l4-lhps1yuzszk0315 (humgen-04-02.internal.sanger.ac.uk:25107, disk): send trash list: start
2016/06/19 18:12:39 z8ta6-bi6l4-sg7xxak114gh1j0 (humgen-02-03.internal.sanger.ac.uk:25107, disk): send trash list: start
2016/06/19 18:12:39 z8ta6-bi6l4-az89xled1ycwnpb (humgen-04-03.internal.sanger.ac.uk:25107, disk): send trash list: start
2016/06/19 18:12:39 z8ta6-bi6l4-4b0e02ad7mk84ye (humgen-01-01.internal.sanger.ac.uk:25107, disk): send trash list: start
2016/06/19 18:12:39 z8ta6-bi6l4-nynctbmdi8nj6v0 (humgen-01-03.internal.sanger.ac.uk:25107, disk): send trash list: start
2016/06/19 18:12:39 z8ta6-bi6l4-a1pntf0wx8vfr5v (humgen-03-03.internal.sanger.ac.uk:25107, disk): send trash list: start
2016/06/19 18:12:39 z8ta6-bi6l4-yxhkoekmnv5czf3 (humgen-02-01.internal.sanger.ac.uk:25107, disk): send trash list: start
2016/06/19 18:12:39 z8ta6-bi6l4-3kqkr5lgow2uogm (humgen-03-01.internal.sanger.ac.uk:25107, disk): send trash list: start
2016/06/19 18:12:39 z8ta6-bi6l4-ph34sug9wmnom07 (humgen-03-02.internal.sanger.ac.uk:25107, disk): send trash list: start
2016/06/19 18:12:39 z8ta6-bi6l4-sg7xxak114gh1j0 (humgen-02-03.internal.sanger.ac.uk:25107, disk): send trash list: took 1.211381ms
2016/06/19 18:12:39 z8ta6-bi6l4-a1pntf0wx8vfr5v (humgen-03-03.internal.sanger.ac.uk:25107, disk): send trash list: took 1.302391ms
2016/06/19 18:12:39 z8ta6-bi6l4-lhps1yuzszk0315 (humgen-04-02.internal.sanger.ac.uk:25107, disk): send trash list: took 1.344222ms
2016/06/19 18:12:39 z8ta6-bi6l4-ph34sug9wmnom07 (humgen-03-02.internal.sanger.ac.uk:25107, disk): send trash list: took 1.338358ms
2016/06/19 18:12:39 z8ta6-bi6l4-stmnte9yvd2gh6o (humgen-04-01.internal.sanger.ac.uk:25107, disk): send trash list: took 1.511689ms
2016/06/19 18:12:39 z8ta6-bi6l4-kijrzcy3zkflg3s (humgen-01-02.internal.sanger.ac.uk:25107, disk): send trash list: took 1.519639ms
2016/06/19 18:12:39 z8ta6-bi6l4-4b0e02ad7mk84ye (humgen-01-01.internal.sanger.ac.uk:25107, disk): send trash list: took 1.563933ms
2016/06/19 18:12:39 z8ta6-bi6l4-w3rpndae62qwwre (humgen-02-02.internal.sanger.ac.uk:25107, disk): send trash list: took 1.586532ms
2016/06/19 18:12:39 z8ta6-bi6l4-3kqkr5lgow2uogm (humgen-03-01.internal.sanger.ac.uk:25107, disk): send trash list: took 1.47872ms
2016/06/19 18:12:39 z8ta6-bi6l4-az89xled1ycwnpb (humgen-04-03.internal.sanger.ac.uk:25107, disk): send trash list: took 1.577564ms
2016/06/19 18:12:39 z8ta6-bi6l4-nynctbmdi8nj6v0 (humgen-01-03.internal.sanger.ac.uk:25107, disk): send trash list: took 1.577052ms
2016/06/19 18:12:39 z8ta6-bi6l4-yxhkoekmnv5czf3 (humgen-02-01.internal.sanger.ac.uk:25107, disk): send trash list: took 1.728638ms
2016/06/19 18:12:39 GetCurrentState: start
2016/06/19 18:12:39 z8ta6-bi6l4-ph34sug9wmnom07 (humgen-03-02.internal.sanger.ac.uk:25107, disk): retrieve index
2016/06/19 18:12:39 z8ta6-bi6l4-az89xled1ycwnpb (humgen-04-03.internal.sanger.ac.uk:25107, disk): retrieve index
2016/06/19 18:12:39 z8ta6-bi6l4-nynctbmdi8nj6v0 (humgen-01-03.internal.sanger.ac.uk:25107, disk): retrieve index
2016/06/19 18:12:39 z8ta6-bi6l4-w3rpndae62qwwre (humgen-02-02.internal.sanger.ac.uk:25107, disk): retrieve index
2016/06/19 18:12:39 z8ta6-bi6l4-kijrzcy3zkflg3s (humgen-01-02.internal.sanger.ac.uk:25107, disk): retrieve index
2016/06/19 18:12:39 z8ta6-bi6l4-stmnte9yvd2gh6o (humgen-04-01.internal.sanger.ac.uk:25107, disk): retrieve index
2016/06/19 18:12:39 z8ta6-bi6l4-4b0e02ad7mk84ye (humgen-01-01.internal.sanger.ac.uk:25107, disk): retrieve index
2016/06/19 18:12:39 z8ta6-bi6l4-sg7xxak114gh1j0 (humgen-02-03.internal.sanger.ac.uk:25107, disk): retrieve index
2016/06/19 18:12:39 z8ta6-bi6l4-yxhkoekmnv5czf3 (humgen-02-01.internal.sanger.ac.uk:25107, disk): retrieve index
2016/06/19 18:12:39 z8ta6-bi6l4-a1pntf0wx8vfr5v (humgen-03-03.internal.sanger.ac.uk:25107, disk): retrieve index
2016/06/19 18:12:39 z8ta6-bi6l4-3kqkr5lgow2uogm (humgen-03-01.internal.sanger.ac.uk:25107, disk): retrieve index
2016/06/19 18:12:39 z8ta6-bi6l4-lhps1yuzszk0315 (humgen-04-02.internal.sanger.ac.uk:25107, disk): retrieve index
2016/06/19 18:12:43 collections: 0/3273174
2016/06/19 18:12:53 z8ta6-bi6l4-nynctbmdi8nj6v0 (humgen-01-03.internal.sanger.ac.uk:25107, disk): add 1312482 replicas to map
2016/06/19 18:12:53 z8ta6-bi6l4-3kqkr5lgow2uogm (humgen-03-01.internal.sanger.ac.uk:25107, disk): add 1334745 replicas to map
2016/06/19 18:12:53 z8ta6-bi6l4-sg7xxak114gh1j0 (humgen-02-03.internal.sanger.ac.uk:25107, disk): add 1338775 replicas to map
2016/06/19 18:12:54 z8ta6-bi6l4-a1pntf0wx8vfr5v (humgen-03-03.internal.sanger.ac.uk:25107, disk): add 1339176 replicas to map
2016/06/19 18:12:54 z8ta6-bi6l4-stmnte9yvd2gh6o (humgen-04-01.internal.sanger.ac.uk:25107, disk): add 1337822 replicas to map
2016/06/19 18:12:54 z8ta6-bi6l4-yxhkoekmnv5czf3 (humgen-02-01.internal.sanger.ac.uk:25107, disk): add 1335581 replicas to map
2016/06/19 18:12:54 z8ta6-bi6l4-w3rpndae62qwwre (humgen-02-02.internal.sanger.ac.uk:25107, disk): add 1338270 replicas to map
2016/06/19 18:12:54 z8ta6-bi6l4-4b0e02ad7mk84ye (humgen-01-01.internal.sanger.ac.uk:25107, disk): add 1337686 replicas to map
2016/06/19 18:12:54 z8ta6-bi6l4-lhps1yuzszk0315 (humgen-04-02.internal.sanger.ac.uk:25107, disk): add 1340508 replicas to map
2016/06/19 18:12:54 z8ta6-bi6l4-kijrzcy3zkflg3s (humgen-01-02.internal.sanger.ac.uk:25107, disk): add 1346446 replicas to map
2016/06/19 18:12:54 z8ta6-bi6l4-az89xled1ycwnpb (humgen-04-03.internal.sanger.ac.uk:25107, disk): add 1373440 replicas to map
2016/06/19 18:12:54 z8ta6-bi6l4-nynctbmdi8nj6v0 (humgen-01-03.internal.sanger.ac.uk:25107, disk): done
2016/06/19 18:12:56 z8ta6-bi6l4-3kqkr5lgow2uogm (humgen-03-01.internal.sanger.ac.uk:25107, disk): done
2016/06/19 18:12:56 z8ta6-bi6l4-sg7xxak114gh1j0 (humgen-02-03.internal.sanger.ac.uk:25107, disk): done
2016/06/19 18:12:58 z8ta6-bi6l4-a1pntf0wx8vfr5v (humgen-03-03.internal.sanger.ac.uk:25107, disk): done
2016/06/19 18:12:59 z8ta6-bi6l4-stmnte9yvd2gh6o (humgen-04-01.internal.sanger.ac.uk:25107, disk): done
2016/06/19 18:13:00 z8ta6-bi6l4-yxhkoekmnv5czf3 (humgen-02-01.internal.sanger.ac.uk:25107, disk): done
2016/06/19 18:13:00 z8ta6-bi6l4-w3rpndae62qwwre (humgen-02-02.internal.sanger.ac.uk:25107, disk): done
2016/06/19 18:13:01 z8ta6-bi6l4-4b0e02ad7mk84ye (humgen-01-01.internal.sanger.ac.uk:25107, disk): done
2016/06/19 18:13:02 z8ta6-bi6l4-lhps1yuzszk0315 (humgen-04-02.internal.sanger.ac.uk:25107, disk): done
2016/06/19 18:13:02 z8ta6-bi6l4-kijrzcy3zkflg3s (humgen-01-02.internal.sanger.ac.uk:25107, disk): done
2016/06/19 18:13:06 z8ta6-bi6l4-az89xled1ycwnpb (humgen-04-03.internal.sanger.ac.uk:25107, disk): done
2016/06/19 18:13:52 collections: 25000/3273174
2016/06/19 18:14:59 z8ta6-bi6l4-ph34sug9wmnom07 (humgen-03-02.internal.sanger.ac.uk:25107, disk): add 1798573 replicas to map
2016/06/19 18:15:00 z8ta6-bi6l4-ph34sug9wmnom07 (humgen-03-02.internal.sanger.ac.uk:25107, disk): done
2016/06/19 18:15:14 collections: 50000/3273174
2016/06/19 18:16:29 collections: 75000/3273174
...


Subtasks 1 (0 open1 closed)

Task #9522: Review 9437-keep-index-nsResolvedTom Clegg06/30/2016Actions

Related issues

Related to Arvados - Bug #9456: [keep-balance] don't clear existing trash lists on the 2nd..Nth iterationResolvedTom Clegg06/22/2016Actions
Actions #1

Updated by Colin Nolan almost 8 years ago

Our initial suspicion is that this is caused by the implementation of the safety mechanism that prevents replicas, which are actually the same, from being accidentally trashed:
https://github.com/curoverse/arvados/blob/master/services/keep-balance/balance.go#L376-L380.

If replicas have the same modified timestamp (`Mtime`), for purposes of safety, the current implementation assumes that they refer to the same block on the same server, and therefore does not trash them. However, given that the time resolution is in seconds, it seems plausible that replicas can be written to different servers and have the same `Mtime`. This would lead to the problem that we have observed of overreplicated blocks that are erroneously not removed during the balancing.

Actions #2

Updated by Tom Clegg almost 8 years ago

I don't think this can be attributed to the "same mtime" guard. Those cases wouldn't be reported in the "size of trash list" figures.

I think this is our best clue:

run failed: z8ta6-bi6l4-4b0e02ad7mk84ye (humgen-01-01.internal.sanger.ac.uk:25107, disk): send pull list: Put http://humgen-01-01.internal.sanger.ac.uk:25107/pull: EOF

keep-balance skips "commit trash" if "commit pulls" fails (there's no specific reason why this is dangerous, other than the general (overly?) cautious idea that if anything unexpected happens at all, maybe we should not delete stuff right now).

My first guess is that "pull: EOF" means keepstore humgen-01-01 is running a version of keepstore that's old enough to have the "accepting a pull list causes crash" bug. To investigate/correct this:
  • Check keepstore logs / service status to see if one (or all) crashed/restarted last time keep-balance tried to commit pulls.
  • Make sure keepstore is 0.1.20160602205111.e6cb669-1 or newer.
  • Run with -commit-pulls=false -commit-trash=true (i.e., delete excess, but don't try to fix underreplicated blocks). The trash code doesn't hit the crashing bug, even in old versions.
Even if that works for you, I see two other bugs here:
  • The logs are confusing. Immediately after lots of "sending pull lists" you get a tiny "run failed" and then a pile of "sending trash lists". Those trash lists are the empty trash lists it sends at the beginning of the next run, though -- not the interesting trash lists it computed in the previous run.
  • Evidently your RunPeriod is less than the time it takes to do one balancing operation. This shouldn't be a problem, except for the "start by sending empty trash lists" safety feature. If, every 3 hours, we send a big trash list and quickly follow it up with an empty trash list, we aren't leaving the keepstore servers nearly enough time to actually delete anything. I've added #9456 for this. Workaround: use a longer RunPeriod.
Actions #3

Updated by Joshua Randall almost 8 years ago

I added some debug printing instrumentation to keep-balance and ran it again to count the reasons why a replica was not trashed.

In the else block after https://github.com/curoverse/arvados/blob/76be616a8a65a6c574026583c462640dcc9e706f/services/keep-balance/balance.go#L398-L401 I put a separate "not trashing" debugf statement for each of the 4 reasons that block would be entered (as well as a "trashing" debugf statement within the if block).

The counts for each condition are (N.B. a particular replica can be included in the total count below mutliple times if it is "not trashed" for multiple reasons):

14417561 not trashing because !(len(uniqueBestRepl) >= blk.Desired)
7400577 not trashing because !(!uniqueBestRepl[repl.Mtime])
777803 trashing
32337 not trashing because !(repl.Mtime < bal.MinMtime)
0 not trashing because !(!srv.ReadOnly)

Actions #4

Updated by Joshua Randall almost 8 years ago

keepstores were occasionally restarting around the time of the pull requests and keepstores were older than 0.1.20160602205111.e6cb669-1. All are now running 0.1.20160615194058.76be616-1

Changing RunPeriod to 6h for now.

However, I think that the 7.4M "not trashing because !(!uniqueBestRepl[repl.Mtime])" are likely to be a problem for us.

I think I understand the reasoning behind having this guard -- is it to protect against accidentally referring to the same keepstore data by two different names? In other words if the same keepstore data directory was accidentally exposed via two different keep services - or if the same keep service somehow showed up twice in our service list, we might accidentally delete all replicas of a block thinking there are many copies when in fact there is actually only one/few...

I can't think of a better way than modification time to protect against that. However, it is also not surprising to me that mTime (with second precision) has a lot of collisions. Each crunch node is typically able to write 5-6Gbps into the keep system, which equates to ten 64MB blocks per second. So, if I'm writing 3 replicas by default, even if they write them in serial there would be a good chance that all three would be written during the same second.

The underlying filesystem behind our keep store appears to have nanosecond precision for ctime/mtime/atime:

root@humgen-01-01:~# curl -s -H "Authorization: OAuth2 ${ARVADOS_API_TOKEN}" http://localhost:25107/index | head -n 1
cc600d08effd861a2744271cf5f2ffb0+3609470 1458345406
root@humgen-01-01:~# date -d "$(stat -c "%z" /data/keep/cc6/cc600d08effd861a2744271cf5f2ffb0)" +"%s.%N" 
1458345406.273054484

Can this be utilised if it is available? I guess there are some portability issues with getting tv_nsec values out of the stat structure, but I would think you could implement an attempt to get more precise times and use them if available. They should be useful to avoid treating phantom copies of the same replica, as long as all keep stores serving a replica have the more precise times available to them.

It seems like it should be pretty easy to extend the index format to support having a floating-point representation of the time (still in seconds) in the second (haha, the second column is the "second" column) column. keepstores that can't get sub-second times can just continue to behave as now but those that can get more precise times can just report seconds with the extra precision after the decimal point.

Consumers of the index would have to be parse that as a float rather than an int, but AFAIK there aren't many index consumers out there, so it may not be a very substantial change?

Actions #5

Updated by Tom Clegg almost 8 years ago

Joshua Randall wrote:

I think I understand the reasoning behind having this guard -- is it to protect against accidentally referring to the same keepstore data by two different names? In other words if the same keepstore data directory was accidentally exposed via two different keep services - or if the same keep service somehow showed up twice in our service list, we might accidentally delete all replicas of a block thinking there are many copies when in fact there is actually only one/few...

Yes, that's right. When using cloud storage facilities, it's especially easy (and useful/recommended) to give multiple keep services write access to a single storage volume (container/bucket).

So far we've been thinking of addressing this with:
  • a "volume ID" -- e.g., a filesystem UUID + path for a local volume (indicating in the index which volume each block is on seems useful for other reasons too)
  • break the tie by sending a "touch" request for one of the blocks when identical timestamps appear. The main problem with this is that if there really was only one replica appearing twice, keep-balance would still see a new collision next time, send another touch request, and so on ad infinitum, effectively making those blocks un-garbage-collectible. It seems like this approach is workable only to break ties on systems that have no shared volumes: we could check the un-Touched block after touching, and throw an error if that reveals unexpected volume sharing.

I can't think of a better way than modification time to protect against that. However, it is also not surprising to me that mTime (with second precision) has a lot of collisions. Each crunch node is typically able to write 5-6Gbps into the keep system, which equates to ten 64MB blocks per second. So, if I'm writing 3 replicas by default, even if they write them in serial there would be a good chance that all three would be written during the same second.

This all sounds right to me. The client code does its best to do concurrent writes so it shouldn't surprise us that "multiple copies with same timestamp" is common.

The underlying filesystem behind our keep store appears to have nanosecond precision for ctime/mtime/atime: Can this be utilised if it is available?

This is a great idea. There are only a few tools that read keepstore indexes, and accepting floats should be a simple change (ditto providing them).

I guess there are some portability issues with getting tv_nsec values out of the stat structure

I just tested this. Apparently, Go is already giving us nanosecond timestamps on ext4. All we have to do is stop truncating them.

package main

import (
    "fmt" 
    "os" 
)

func main() {
    s, _ := os.Stat("ns.go")
    fmt.Printf("%v\n", s.ModTime())
}

// => "2016-06-23 17:53:48.944223452 +0000 UTC" 
Actions #6

Updated by Tom Clegg almost 8 years ago

9437-keep-index-ns @ 7575c3c

Actions #7

Updated by Tom Clegg almost 8 years ago

  • Status changed from New to In Progress
  • Target version set to 2016-07-06 sprint
Actions #8

Updated by Tom Clegg almost 8 years ago

  • Assigned To set to Tom Clegg
Actions #9

Updated by Radhika Chippada almost 8 years ago

Just a few very minor comments.

  • Do we want to clearly state that blob-signature-ttl should in seconds regarding blob-signature-ttl in keepstore.go? Something like: "Lifetime of blob permission signatures in seconds" . Agree that application.default.yml says so already …
    "Lifetime of blob permission signatures. Modifying the ttl will invalidate all existing signatures” …
    
  • This is just out of my curiosity: In volume_generic_test.go, the change from regexp.MatchString to regexp.MustCompile didn’t actually accomplish any enhancement since the compiled regexp is not reused. Correct? (line 387)
  • In volume_generic_test line 393, would it be better if this error message is improved from t.Errorf("Got index %q for empty prefix", sortedIndex) to something like invalid index format?
  • + err := v.IndexTo(prefix, buf) ==> good catch :)
  • It could be much easier and more effective to review if the essential work is in one commit without mixing touchups such as these making it much larger review.
    -               Job{Script_parameters: Tasks{[]TaskDef{TaskDef{
    +               Job{Script_parameters: Tasks{[]TaskDef{{
    

LGTM either way. Thanks.

Actions #10

Updated by Tom Clegg almost 8 years ago

Radhika Chippada wrote:

  • Do we want to clearly state that blob-signature-ttl should in seconds regarding blob-signature-ttl in keepstore.go? Something like: "Lifetime of blob permission signatures in seconds" . Agree that application.default.yml says so already …

Sure thing. Updated.

  • This is just out of my curiosity: In volume_generic_test.go, the change from regexp.MatchString to regexp.MustCompile didn’t actually accomplish any enhancement since the compiled regexp is not reused. Correct? (line 387)

I changed it from MatchString(r,s) to MustCompile(r).FindStringSubmatch(s) because I wanted to inspect the matched substrings (not just the matched/didn't-match bool), and there isn't a regexp.Match*() shortcut for that.

  • In volume_generic_test line 393, would it be better if this error message is improved from t.Errorf("Got index %q for empty prefix", sortedIndex) to something like invalid index format?

"Invalid index format" is one way to fail that regexp, but leaving out some of the hashes or including extras will also fail. I don't think there's much we can really say beyond the implicit "look at this string we got, and the regexp in the code, and figure out how to fix it"...

  • It could be much easier and more effective to review if the essential work is in one commit without mixing touchups such as these making it much larger review.
Huh. The example you gave is from the "gofmt" commit, in a branch with the following commits:
  • 7575c3c 9437: Accept 1-second timestamps from old keepstore servers.
  • e686d4f 9437: gofmt
  • 748b188 9437: Report timestamps as nanoseconds since epoch in keepstore index.

I thought I had done a good job there. What would you prefer to see?

Now at 424ec71

Actions #11

Updated by Radhika Chippada almost 8 years ago

Few comments about the doc update in 424ec71

  • For each of the azure / S3 / unix storage related config params, should we add “required when using X storage” or something like that? For ex: when I am using S3, I do not need “volume” or “azure-storage-container-volume” but require “s3-bucket-volume”, correct?
  • blob-signature-ttl - default 1209600 - can we say “Default is 2 weeks. (default 1209600)”?
  • enforce-permissions - do we want to add default here?
  • I am sorry I could not quite understand what this means: “If smaller than 67108864, use multiple concurrent range requests to retrieve a block”
  • How do “azure-storage-replication” and “s3-replication” impact / relate to the “default_collection_replication” from the api config file?
  • “Replication level to report to clients when” => I think this is a bit cryptic? Can we also include “collections will be replicated to these many storages / times or something?”
  • “Do not write, delete, or touch anything on the following volumes” => “Do not write, delete, or touch anything on the configured volumes?”
  • “trash-lifetime” - please add default here?
  • I see that the doc lists the args in alphabetical order. Would it help if we group them? First list all the "generic" args such as trash-lifetime etc that are required / applicable irrespective of the storage type being used, then groups of args per storage type?
Actions #12

Updated by Tom Clegg almost 8 years ago

Copied doc comments from note-11 to new issue, #9539.

Actions #13

Updated by Radhika Chippada almost 8 years ago

  • Target version changed from 2016-07-06 sprint to 2016-07-20 sprint
Actions #14

Updated by Tom Clegg almost 8 years ago

  • Target version deleted (2016-07-20 sprint)
Actions #15

Updated by Tom Clegg almost 8 years ago

There's a branch in review on #9456 which should address the "starvation due to clearing trash lists" issue. This will also mitigate the misleading log messages to some degree: usually the trash lists won't get cleared, and when they do, there's a new log message saying the trash lists are being cleared.

I'd like to hear some confirmation that the high-resolution timestamps fix the "often can't fix overreplicated blocks" issue in real life. After that, can we call this issue resolved?

Actions #16

Updated by Brett Smith almost 8 years ago

  • Status changed from In Progress to Feedback
Actions #17

Updated by Tom Clegg about 7 years ago

  • Status changed from Feedback to Resolved
Actions

Also available in: Atom PDF