Bug #7254

[SDKs] arv-put doesn't respect --replication for Keep blocks (unless you add --no-resume)

Added by Brett Smith over 4 years ago. Updated over 4 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
-
Category:
SDKs
Target version:
Start date:
09/10/2015
Due date:
% Done:

100%

Estimated time:
Story points:
0.5

Description

Doing a simple arv-put that specifies --replication different than the default will ignore the replication for Keep blocks. They'll still be replicated to Keep at the default level. Here's a demonstration on the PGP tutorial data. The debug output shows it writing each block to two Keep services (each of two ports on keep0):

brett@shell.4xphq:~/keep/by_id/c1bad4b39ca5a924e481008009d94e32+210$ ARVADOS_DEBUG=1 arv-put --replication 1 var-GS000016015-ASM.tsv.bz2
0M / 216M 0.0% 2015-09-10 13:49:05 arvados.keep[15725] DEBUG: {u'4xphq-bi6l4-ospxuln4ial4svv': {u'read_only': False, u'kind': u'arvados#keepService', u'uuid': u'4xphq-bi6l4-ospx
uln4ial4svv', '_service_root': 'http://keep0.4xphq.arvadosapi.com:25108/', u'modified_at': u'2014-05-21T18:59:47.654627000Z', u'created_at': u'2014-05-21T18:59:47.654755000Z', u
'modified_by_client_uuid': None, u'owner_uuid': u'4xphq-tpzed-2ruv9ywhc7ozn9a', u'href': u'/keep_services/4xphq-bi6l4-ospxuln4ial4svv', u'etag': u'9yds3yc08bd97gxht3s95q6ua', u'
service_port': 25108, u'service_type': u'disk', u'service_ssl_flag': False, u'modified_by_user_uuid': u'4xphq-tpzed-000000000000000', u'service_host': u'keep0.4xphq.arvadosapi.c
om'}, u'4xphq-bi6l4-dk9mjspdg2v8mhq': {u'read_only': False, u'kind': u'arvados#keepService', u'uuid': u'4xphq-bi6l4-dk9mjspdg2v8mhq', '_service_root': 'http://keep0.4xphq.arvado
sapi.com:25107/', u'modified_at': u'2014-05-21T18:59:47.620295000Z', u'created_at': u'2014-05-21T18:59:47.620495000Z', u'modified_by_client_uuid': None, u'owner_uuid': u'4xphq-t
pzed-2ruv9ywhc7ozn9a', u'href': u'/keep_services/4xphq-bi6l4-dk9mjspdg2v8mhq', u'etag': u'64l3d8njg0f6enhj7hzx432e5', u'service_port': 25107, u'service_type': u'disk', u'service
_ssl_flag': False, u'modified_by_user_uuid': u'4xphq-tpzed-000000000000000', u'service_host': u'keep0.4xphq.arvadosapi.com'}}
2015-09-10 13:49:05 arvados.keep[15725] DEBUG: [{u'read_only': False, u'kind': u'arvados#keepService', u'uuid': u'4xphq-bi6l4-dk9mjspdg2v8mhq', '_service_root': 'http://keep0.4xphq.arvadosapi.com:25107/', u'modified_at': u'2014-05-21T18:59:47.620295000Z', u'created_at': u'2014-05-21T18:59:47.620495000Z', u'modified_by_client_uuid': None, u'owner_uuid': u'4xphq-tpzed-2ruv9ywhc7ozn9a', u'href': u'/keep_services/4xphq-bi6l4-dk9mjspdg2v8mhq', u'etag': u'64l3d8njg0f6enhj7hzx432e5', u'service_port': 25107, u'service_type': u'disk', u'service_ssl_flag': False, u'modified_by_user_uuid': u'4xphq-tpzed-000000000000000', u'service_host': u'keep0.4xphq.arvadosapi.com'}, {u'read_only': False, u'kind': u'arvados#keepService', u'uuid': u'4xphq-bi6l4-ospxuln4ial4svv', '_service_root': 'http://keep0.4xphq.arvadosapi.com:25108/', u'modified_at': u'2014-05-21T18:59:47.654627000Z', u'created_at': u'2014-05-21T18:59:47.654755000Z', u'modified_by_client_uuid': None, u'owner_uuid': u'4xphq-tpzed-2ruv9ywhc7ozn9a', u'href': u'/keep_services/4xphq-bi6l4-ospxuln4ial4svv', u'etag': u'9yds3yc08bd97gxht3s95q6ua', u'service_port': 25108, u'service_type': u'disk', u'service_ssl_flag': False, u'modified_by_user_uuid': u'4xphq-tpzed-000000000000000', u'service_host': u'keep0.4xphq.arvadosapi.com'}]
2015-09-10 13:49:05 arvados.keep[15725] DEBUG: 204e43b8a1185621ca55a94839582e6f+67108864: ['http://keep0.4xphq.arvadosapi.com:25108/', 'http://keep0.4xphq.arvadosapi.com:25107/']
2015-09-10 13:49:05 arvados.keep[15725] DEBUG: KeepWriterThread <KeepWriterThread(Thread-1, started 140344599521024)> proceeding 204e43b8a1185621ca55a94839582e6f+67108864 http://keep0.4xphq.arvadosapi.com:25107/
2015-09-10 13:49:05 arvados.keep[15725] DEBUG: Request: PUT http://keep0.4xphq.arvadosapi.com:25107/204e43b8a1185621ca55a94839582e6f
2015-09-10 13:49:05 arvados.keep[15725] DEBUG: KeepWriterThread <KeepWriterThread(Thread-2, started 140344591128320)> proceeding 204e43b8a1185621ca55a94839582e6f+67108864 http://keep0.4xphq.arvadosapi.com:25108/
2015-09-10 13:49:05 arvados.keep[15725] DEBUG: Request: PUT http://keep0.4xphq.arvadosapi.com:25108/204e43b8a1185621ca55a94839582e6f
2015-09-10 13:49:09 arvados.keep[15725] DEBUG: KeepWriterThread <KeepWriterThread(Thread-1, started 140344599521024)> succeeded 204e43b8a1185621ca55a94839582e6f+67108864 http://keep0.4xphq.arvadosapi.com:25107/
2015-09-10 13:49:09 arvados.keep[15725] DEBUG: KeepWriterThread <KeepWriterThread(Thread-2, started 140344591128320)> succeeded 204e43b8a1185621ca55a94839582e6f+67108864 http://keep0.4xphq.arvadosapi.com:25108/
64M / 216M 29.5% 2015-09-10 13:49:10 arvados.keep[15725] DEBUG: b9677abbac956bd3e86b1deb28dfac03+67108864: ['http://keep0.4xphq.arvadosapi.com:25107/', 'http://keep0.4xphq.arvadosapi.com:25108/']
2015-09-10 13:49:10 arvados.keep[15725] DEBUG: KeepWriterThread <KeepWriterThread(Thread-3, started 140344591128320)> proceeding b9677abbac956bd3e86b1deb28dfac03+67108864 http://keep0.4xphq.arvadosapi.com:25107/
2015-09-10 13:49:10 arvados.keep[15725] DEBUG: Request: PUT http://keep0.4xphq.arvadosapi.com:25107/b9677abbac956bd3e86b1deb28dfac03
2015-09-10 13:49:10 arvados.keep[15725] DEBUG: KeepWriterThread <KeepWriterThread(Thread-4, started 140344599521024)> proceeding b9677abbac956bd3e86b1deb28dfac03+67108864 http://keep0.4xphq.arvadosapi.com:25108/
2015-09-10 13:49:10 arvados.keep[15725] DEBUG: Request: PUT http://keep0.4xphq.arvadosapi.com:25108/b9677abbac956bd3e86b1deb28dfac03
2015-09-10 13:49:14 arvados.keep[15725] DEBUG: KeepWriterThread <KeepWriterThread(Thread-4, started 140344599521024)> succeeded b9677abbac956bd3e86b1deb28dfac03+67108864 http://keep0.4xphq.arvadosapi.com:25108/
2015-09-10 13:49:14 arvados.keep[15725] DEBUG: KeepWriterThread <KeepWriterThread(Thread-3, started 140344591128320)> succeeded b9677abbac956bd3e86b1deb28dfac03+67108864 http://keep0.4xphq.arvadosapi.com:25107/
128M / 216M 59.1% 2015-09-10 13:49:14 arvados.keep[15725] DEBUG: fc15aff2a762b13f521baf042140acec+67108864: ['http://keep0.4xphq.arvadosapi.com:25108/', 'http://keep0.4xphq.arvadosapi.com:25107/']
2015-09-10 13:49:14 arvados.keep[15725] DEBUG: KeepWriterThread <KeepWriterThread(Thread-5, started 140344591128320)> proceeding fc15aff2a762b13f521baf042140acec+67108864 http://keep0.4xphq.arvadosapi.com:25107/
2015-09-10 13:49:14 arvados.keep[15725] DEBUG: Request: PUT http://keep0.4xphq.arvadosapi.com:25107/fc15aff2a762b13f521baf042140acec
2015-09-10 13:49:14 arvados.keep[15725] DEBUG: KeepWriterThread <KeepWriterThread(Thread-6, started 140344599521024)> proceeding fc15aff2a762b13f521baf042140acec+67108864 http://keep0.4xphq.arvadosapi.com:25108/
2015-09-10 13:49:14 arvados.keep[15725] DEBUG: Request: PUT http://keep0.4xphq.arvadosapi.com:25108/fc15aff2a762b13f521baf042140acec
2015-09-10 13:49:17 arvados.keep[15725] DEBUG: KeepWriterThread <KeepWriterThread(Thread-6, started 140344599521024)> succeeded fc15aff2a762b13f521baf042140acec+67108864 http://keep0.4xphq.arvadosapi.com:25108/
2015-09-10 13:49:17 arvados.keep[15725] DEBUG: KeepWriterThread <KeepWriterThread(Thread-5, started 140344591128320)> succeeded fc15aff2a762b13f521baf042140acec+67108864 http://keep0.4xphq.arvadosapi.com:25107/
192M / 216M 88.6% 2015-09-10 13:49:18 arvados.keep[15725] DEBUG: 323d2a3ce20370c4ca1d3462a344f8fd+25885655: ['http://keep0.4xphq.arvadosapi.com:25108/', 'http://keep0.4xphq.arvadosapi.com:25107/']
2015-09-10 13:49:18 arvados.keep[15725] DEBUG: KeepWriterThread <KeepWriterThread(Thread-7, started 140344591128320)> proceeding 323d2a3ce20370c4ca1d3462a344f8fd+25885655 http://keep0.4xphq.arvadosapi.com:25107/
2015-09-10 13:49:18 arvados.keep[15725] DEBUG: Request: PUT http://keep0.4xphq.arvadosapi.com:25107/323d2a3ce20370c4ca1d3462a344f8fd
2015-09-10 13:49:18 arvados.keep[15725] DEBUG: KeepWriterThread <KeepWriterThread(Thread-8, started 140344599521024)> proceeding 323d2a3ce20370c4ca1d3462a344f8fd+25885655 http://keep0.4xphq.arvadosapi.com:25108/
2015-09-10 13:49:18 arvados.keep[15725] DEBUG: Request: PUT http://keep0.4xphq.arvadosapi.com:25108/323d2a3ce20370c4ca1d3462a344f8fd
2015-09-10 13:49:19 arvados.keep[15725] DEBUG: KeepWriterThread <KeepWriterThread(Thread-8, started 140344599521024)> succeeded 323d2a3ce20370c4ca1d3462a344f8fd+25885655 http://keep0.4xphq.arvadosapi.com:25108/
2015-09-10 13:49:19 arvados.keep[15725] DEBUG: KeepWriterThread <KeepWriterThread(Thread-7, started 140344591128320)> succeeded 323d2a3ce20370c4ca1d3462a344f8fd+25885655 http://keep0.4xphq.arvadosapi.com:25107/
216M / 216M 100.0%
Collection saved as 'Saved at 2015-09-10 13:49:04 UTC by brett@shell.4xphq.arvadosapi.com'
4xphq-4zz18-q0u4nb9z66joxs6

Same thing happens with more-than-default replication (e.g., --replication=3). replication_desired is set correctly on the output collection, but Keep blocks are always replicated twice.

It works if you add --no-resume:

brett@shell.4xphq:~/keep/by_id/c1bad4b39ca5a924e481008009d94e32+210$ ARVADOS_DEBUG=1 arv-put --replication 1 --no-resume var-GS000016015-ASM.tsv.bz2
0M / 216M 0.0% 2015-09-10 13:52:49 arvados.keep[16939] DEBUG: {u'4xphq-bi6l4-ospxuln4ial4svv': {u'read_only': False, u'kind': u'arvados#keepService', u'uuid': u'4xphq-bi6l4-ospxuln4ial4svv', '_service_root': 'http://keep0.4xphq.arvadosapi.com:25108/', u'modified_at': u'2014-05-21T18:59:47.654627000Z', u'created_at': u'2014-05-21T18:59:47.654755000Z', u'modified_by_client_uuid': None, u'owner_uuid': u'4xphq-tpzed-2ruv9ywhc7ozn9a', u'href': u'/keep_services/4xphq-bi6l4-ospxuln4ial4svv', u'etag': u'9yds3yc08bd97gxht3s95q6ua', u'service_port': 25108, u'service_type': u'disk', u'service_ssl_flag': False, u'modified_by_user_uuid': u'4xphq-tpzed-000000000000000', u'service_host': u'keep0.4xphq.arvadosapi.com'}, u'4xphq-bi6l4-dk9mjspdg2v8mhq': {u'read_only': False, u'kind': u'arvados#keepService', u'uuid': u'4xphq-bi6l4-dk9mjspdg2v8mhq', '_service_root': 'http://keep0.4xphq.arvadosapi.com:25107/', u'modified_at': u'2014-05-21T18:59:47.620295000Z', u'created_at': u'2014-05-21T18:59:47.620495000Z', u'modified_by_client_uuid': None, u'owner_uuid': u'4xphq-tpzed-2ruv9ywhc7ozn9a', u'href': u'/keep_services/4xphq-bi6l4-dk9mjspdg2v8mhq', u'etag': u'64l3d8njg0f6enhj7hzx432e5', u'service_port': 25107, u'service_type': u'disk', u'service_ssl_flag': False, u'modified_by_user_uuid': u'4xphq-tpzed-000000000000000', u'service_host': u'keep0.4xphq.arvadosapi.com'}}
2015-09-10 13:52:49 arvados.keep[16939] DEBUG: [{u'read_only': False, u'kind': u'arvados#keepService', u'uuid': u'4xphq-bi6l4-dk9mjspdg2v8mhq', '_service_root': 'http://keep0.4xphq.arvadosapi.com:25107/', u'modified_at': u'2014-05-21T18:59:47.620295000Z', u'created_at': u'2014-05-21T18:59:47.620495000Z', u'modified_by_client_uuid': None, u'owner_uuid': u'4xphq-tpzed-2ruv9ywhc7ozn9a', u'href': u'/keep_services/4xphq-bi6l4-dk9mjspdg2v8mhq', u'etag': u'64l3d8njg0f6enhj7hzx432e5', u'service_port': 25107, u'service_type': u'disk', u'service_ssl_flag': False, u'modified_by_user_uuid': u'4xphq-tpzed-000000000000000', u'service_host': u'keep0.4xphq.arvadosapi.com'}, {u'read_only': False, u'kind': u'arvados#keepService', u'uuid': u'4xphq-bi6l4-ospxuln4ial4svv', '_service_root': 'http://keep0.4xphq.arvadosapi.com:25108/', u'modified_at': u'2014-05-21T18:59:47.654627000Z', u'created_at': u'2014-05-21T18:59:47.654755000Z', u'modified_by_client_uuid': None, u'owner_uuid': u'4xphq-tpzed-2ruv9ywhc7ozn9a', u'href': u'/keep_services/4xphq-bi6l4-ospxuln4ial4svv', u'etag': u'9yds3yc08bd97gxht3s95q6ua', u'service_port': 25108, u'service_type': u'disk', u'service_ssl_flag': False, u'modified_by_user_uuid': u'4xphq-tpzed-000000000000000', u'service_host': u'keep0.4xphq.arvadosapi.com'}]
2015-09-10 13:52:49 arvados.keep[16939] DEBUG: 204e43b8a1185621ca55a94839582e6f+67108864: ['http://keep0.4xphq.arvadosapi.com:25108/', 'http://keep0.4xphq.arvadosapi.com:25107/']
2015-09-10 13:52:49 arvados.keep[16939] DEBUG: KeepWriterThread <KeepWriterThread(Thread-1, started 140608440813312)> proceeding 204e43b8a1185621ca55a94839582e6f+67108864 http://keep0.4xphq.arvadosapi.com:25107/
2015-09-10 13:52:49 arvados.keep[16939] DEBUG: Request: PUT http://keep0.4xphq.arvadosapi.com:25107/204e43b8a1185621ca55a94839582e6f
2015-09-10 13:52:50 arvados.keep[16939] DEBUG: KeepWriterThread <KeepWriterThread(Thread-1, started 140608440813312)> succeeded 204e43b8a1185621ca55a94839582e6f+67108864 http://keep0.4xphq.arvadosapi.com:25107/
64M / 216M 29.5% 2015-09-10 13:52:51 arvados.keep[16939] DEBUG: b9677abbac956bd3e86b1deb28dfac03+67108864: ['http://keep0.4xphq.arvadosapi.com:25107/', 'http://keep0.4xphq.arvadosapi.com:25108/']
2015-09-10 13:52:51 arvados.keep[16939] DEBUG: KeepWriterThread <KeepWriterThread(Thread-3, started 140608440813312)> proceeding b9677abbac956bd3e86b1deb28dfac03+67108864 http://keep0.4xphq.arvadosapi.com:25107/
2015-09-10 13:52:51 arvados.keep[16939] DEBUG: Request: PUT http://keep0.4xphq.arvadosapi.com:25107/b9677abbac956bd3e86b1deb28dfac03
2015-09-10 13:52:52 arvados.keep[16939] DEBUG: KeepWriterThread <KeepWriterThread(Thread-3, started 140608440813312)> succeeded b9677abbac956bd3e86b1deb28dfac03+67108864 http://keep0.4xphq.arvadosapi.com:25107/
128M / 216M 59.1% 2015-09-10 13:52:53 arvados.keep[16939] DEBUG: fc15aff2a762b13f521baf042140acec+67108864: ['http://keep0.4xphq.arvadosapi.com:25108/', 'http://keep0.4xphq.arvadosapi.com:25107/']
2015-09-10 13:52:53 arvados.keep[16939] DEBUG: KeepWriterThread <KeepWriterThread(Thread-5, started 140608440813312)> proceeding fc15aff2a762b13f521baf042140acec+67108864 http://keep0.4xphq.arvadosapi.com:25107/
2015-09-10 13:52:53 arvados.keep[16939] DEBUG: Request: PUT http://keep0.4xphq.arvadosapi.com:25107/fc15aff2a762b13f521baf042140acec
2015-09-10 13:52:54 arvados.keep[16939] DEBUG: KeepWriterThread <KeepWriterThread(Thread-5, started 140608440813312)> succeeded fc15aff2a762b13f521baf042140acec+67108864 http://keep0.4xphq.arvadosapi.com:25107/
192M / 216M 88.6% 2015-09-10 13:52:55 arvados.keep[16939] DEBUG: 323d2a3ce20370c4ca1d3462a344f8fd+25885655: ['http://keep0.4xphq.arvadosapi.com:25108/', 'http://keep0.4xphq.arvadosapi.com:25107/']
2015-09-10 13:52:55 arvados.keep[16939] DEBUG: KeepWriterThread <KeepWriterThread(Thread-7, started 140608440813312)> proceeding 323d2a3ce20370c4ca1d3462a344f8fd+25885655 http://keep0.4xphq.arvadosapi.com:25107/
2015-09-10 13:52:55 arvados.keep[16939] DEBUG: Request: PUT http://keep0.4xphq.arvadosapi.com:25107/323d2a3ce20370c4ca1d3462a344f8fd
2015-09-10 13:52:55 arvados.keep[16939] DEBUG: KeepWriterThread <KeepWriterThread(Thread-7, started 140608440813312)> succeeded 323d2a3ce20370c4ca1d3462a344f8fd+25885655 http://keep0.4xphq.arvadosapi.com:25107/
216M / 216M 100.0%
Collection saved as 'Saved at 2015-09-10 13:52:48 UTC by brett@shell.4xphq.arvadosapi.com'
4xphq-4zz18-lxei9i58nv4gvo7

Associated revisions

Revision ba019ddb
Added by Tom Clegg over 4 years ago

Merge branch '7254-dont-lose-replication-arg' closes #7254

History

#1 Updated by Brett Smith over 4 years ago

  • Description updated (diff)

#2 Updated by Brett Smith over 4 years ago

  • Description updated (diff)

#3 Updated by Brett Smith over 4 years ago

  • Target version set to Arvados Future Sprints

#4 Updated by Brett Smith over 4 years ago

  • Story points set to 0.5

#5 Updated by Tom Clegg over 4 years ago

There is a fairly obvious bugfix at 8325a88. This code path happens when trying (but failing) to load state from the resume cache. I haven't actually managed to expose the bug with a test case, but I did add a related test: ensure the replication argument gets passed through to KeepClient (not just collections.create()).

See 7254-dont-lose-replication-arg

#6 Updated by Brett Smith over 4 years ago

Tom Clegg wrote:

There is a fairly obvious bugfix at 8325a88. This code path happens when trying (but failing) to load state from the resume cache. I haven't actually managed to expose the bug with a test case, but I did add a related test: ensure the replication argument gets passed through to KeepClient (not just collections.create()).

See 7254-dont-lose-replication-arg

The fix is good.

I've pushed a change that updates the test to tickle the bug, and (I hope) improve readability a bit. Your version was causing arv-put to fail setting up the cache at all (rather than loading it), which means it never got to the branch we wanted to test. My version mocks ResumeCache to make sure the load step specifically fails. It also saves us a bunch of filesystem manipulation.

I also changed x[len(x)-1] to just x[-1].

If those changes look good to you, I'd say let's merge this. Feel free to rebase and squash or whatever. Thanks.

#7 Updated by Tom Clegg over 4 years ago

  • Status changed from New to Resolved
  • % Done changed from 0 to 100

Applied in changeset arvados|commit:ba019ddb35781404a62924374ec3c0046323ead5.

#8 Updated by Brett Smith over 4 years ago

  • Target version changed from Arvados Future Sprints to 2015-10-14 sprint

Also available in: Atom PDF