Project

General

Profile

Actions

Bug #6844

closed

[SDKs] arv-copy consistently gets MemoryError trying to copy an 8GiB collection

Added by Brett Smith almost 9 years ago. Updated over 8 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
SDKs
Target version:
Story points:
1.0

Description

qr1hi-4zz18-c1t7uidtqs5sg82 is public reference data for the Tuxedo pipeline. It's 8GiB.

I am trying to copy this data, using a shell node that has 4GiB RAM and 4GiB swap and current master arv-copy. While it's copying blocks, it eventually gets a MemoryError, and then the whole copy falls apart. Here's a transcript leading up to the badness, with ARVADOS_DEBUG enabled:

qr1hi-4zz18-c1t7uidtqs5sg82: 3904M / 8184M 47.7% 2015-07-31 18:48:02 arvados.keep[11112] DEBUG: c948eb1f7afcaf183552c531c08dc114+67108864+A…: ['https://keep.qr1hi.arvadosapi.com:443/']
2015-07-31 18:48:02 arvados.keep[11112] DEBUG: Request: GET https://keep.qr1hi.arvadosapi.com:443/c948eb1f7afcaf183552c531c08dc114+67108864+A…
2015-07-31 18:48:03 arvados.keep[11112] INFO: 200 response: 67108864 bytes in 1650.02799034 msec (38.787 MiB/sec)
2015-07-31 18:48:03 arvados.keep[11112] DEBUG: c948eb1f7afcaf183552c531c08dc114+67108864: ['http://keep0.xyzzy.arvadosapi.com:25107/', 'http://keep1.xyzzy.arvadosapi.com:25107/']
2015-07-31 18:48:03 arvados.keep[11112] DEBUG: KeepWriterThread <KeepWriterThread(Thread-123, started 139671556409088)> proceeding c948eb1f7afcaf183552c531c08dc114+67108864 http://keep1.xyzzy.arvadosapi.com:25107/
2015-07-31 18:48:03 arvados.keep[11112] DEBUG: Request: PUT http://keep1.xyzzy.arvadosapi.com:25107/c948eb1f7afcaf183552c531c08dc114
2015-07-31 18:48:03 arvados.keep[11112] DEBUG: KeepWriterThread <KeepWriterThread(Thread-124, started 139671564801792)> proceeding c948eb1f7afcaf183552c531c08dc114+67108864 http://keep0.xyzzy.arvadosapi.com:25107/
2015-07-31 18:48:03 arvados.keep[11112] DEBUG: Request: PUT http://keep0.xyzzy.arvadosapi.com:25107/c948eb1f7afcaf183552c531c08dc114
2015-07-31 18:48:05 arvados.keep[11112] DEBUG: KeepWriterThread <KeepWriterThread(Thread-123, started 139671556409088)> succeeded c948eb1f7afcaf183552c531c08dc114+67108864 http://keep1.xyzzy.arvadosapi.com:25107/
2015-07-31 18:48:05 arvados.keep[11112] DEBUG: KeepWriterThread <KeepWriterThread(Thread-124, started 139671564801792)> succeeded c948eb1f7afcaf183552c531c08dc114+67108864 http://keep0.xyzzy.arvadosapi.com:25107/
qr1hi-4zz18-c1t7uidtqs5sg82: 3968M / 8184M 48.5% 2015-07-31 18:48:05 arvados.keep[11112] DEBUG: 7c8de9368ec9ad742a3a27336a134dc5+67108864+A…: ['https://keep.qr1hi.arvadosapi.com:443/']
2015-07-31 18:48:05 arvados.keep[11112] DEBUG: Request: GET https://keep.qr1hi.arvadosapi.com:443/7c8de9368ec9ad742a3a27336a134dc5+67108864+A…
2015-07-31 18:48:07 arvados.keep[11112] INFO: 200 response: 67108864 bytes in 1569.37599182 msec (40.781 MiB/sec)
2015-07-31 18:48:07 arvados.keep[11112] DEBUG: 7c8de9368ec9ad742a3a27336a134dc5+67108864: ['http://keep1.xyzzy.arvadosapi.com:25107/', 'http://keep0.xyzzy.arvadosapi.com:25107/']
2015-07-31 18:48:07 arvados.keep[11112] DEBUG: KeepWriterThread <KeepWriterThread(Thread-125, started 139671564801792)> proceeding 7c8de9368ec9ad742a3a27336a134dc5+67108864 http://keep1.xyzzy.arvadosapi.com:25107/
2015-07-31 18:48:07 arvados.keep[11112] DEBUG: Request: PUT http://keep1.xyzzy.arvadosapi.com:25107/7c8de9368ec9ad742a3a27336a134dc5
2015-07-31 18:48:07 arvados.keep[11112] DEBUG: KeepWriterThread <KeepWriterThread(Thread-126, started 139671556409088)> proceeding 7c8de9368ec9ad742a3a27336a134dc5+67108864 http://keep0.xyzzy.arvadosapi.com:25107/
2015-07-31 18:48:07 arvados.keep[11112] DEBUG: Request: PUT http://keep0.xyzzy.arvadosapi.com:25107/7c8de9368ec9ad742a3a27336a134dc5
2015-07-31 18:48:09 arvados.keep[11112] DEBUG: KeepWriterThread <KeepWriterThread(Thread-125, started 139671564801792)> succeeded 7c8de9368ec9ad742a3a27336a134dc5+67108864 http://keep1.xyzzy.arvadosapi.com:25107/
2015-07-31 18:48:09 arvados.keep[11112] DEBUG: KeepWriterThread <KeepWriterThread(Thread-126, started 139671556409088)> succeeded 7c8de9368ec9ad742a3a27336a134dc5+67108864 http://keep0.xyzzy.arvadosapi.com:25107/
qr1hi-4zz18-c1t7uidtqs5sg82: 4032M / 8184M 49.3% 2015-07-31 18:48:09 arvados.keep[11112] DEBUG: 7ac2a965fe398a919e86392d67bce52b+67108864+A…: ['https://keep.qr1hi.arvadosapi.com:443/']
2015-07-31 18:48:09 arvados.keep[11112] DEBUG: Request: GET https://keep.qr1hi.arvadosapi.com:443/7ac2a965fe398a919e86392d67bce52b+67108864+A…
2015-07-31 18:48:10 arvados.keep[11112] INFO: 200 response: 67108864 bytes in 1575.77109337 msec (40.615 MiB/sec)
2015-07-31 18:48:10 arvados.keep[11112] DEBUG: 7ac2a965fe398a919e86392d67bce52b+67108864: ['http://keep0.xyzzy.arvadosapi.com:25107/', 'http://keep1.xyzzy.arvadosapi.com:25107/']
2015-07-31 18:48:10 arvados.keep[11112] DEBUG: KeepWriterThread <KeepWriterThread(Thread-127, started 139671556409088)> proceeding 7ac2a965fe398a919e86392d67bce52b+67108864 http://keep1.xyzzy.arvadosapi.com:25107/
2015-07-31 18:48:10 arvados.keep[11112] DEBUG: Request: PUT http://keep1.xyzzy.arvadosapi.com:25107/7ac2a965fe398a919e86392d67bce52b
2015-07-31 18:48:10 arvados.keep[11112] DEBUG: KeepWriterThread <KeepWriterThread(Thread-128, started 139671564801792)> proceeding 7ac2a965fe398a919e86392d67bce52b+67108864 http://keep0.xyzzy.arvadosapi.com:25107/
2015-07-31 18:48:10 arvados.keep[11112] DEBUG: Request: PUT http://keep0.xyzzy.arvadosapi.com:25107/7ac2a965fe398a919e86392d67bce52b
2015-07-31 18:48:12 arvados.keep[11112] DEBUG: KeepWriterThread <KeepWriterThread(Thread-128, started 139671564801792)> succeeded 7ac2a965fe398a919e86392d67bce52b+67108864 http://keep0.xyzzy.arvadosapi.com:25107/
2015-07-31 18:48:13 arvados.keep[11112] DEBUG: KeepWriterThread <KeepWriterThread(Thread-127, started 139671556409088)> succeeded 7ac2a965fe398a919e86392d67bce52b+67108864 http://keep1.xyzzy.arvadosapi.com:25107/
qr1hi-4zz18-c1t7uidtqs5sg82: 4096M / 8184M 50.0% 2015-07-31 18:48:13 arvados.keep[11112] DEBUG: 22e4cbeae1375cb0aea342ab68ddbfef+67108864+A…: ['https://keep.qr1hi.arvadosapi.com:443/']
2015-07-31 18:48:13 arvados.keep[11112] DEBUG: Request: GET https://keep.qr1hi.arvadosapi.com:443/22e4cbeae1375cb0aea342ab68ddbfef+67108864+A…
MemoryError: out of memory
2015-07-31 18:48:14 arvados.keep[11112] DEBUG: Request fail: GET https://keep.qr1hi.arvadosapi.com:443/22e4cbeae1375cb0aea342ab68ddbfef+67108864+A… => <class 'arvados.errors.HttpError'>:
2015-07-31 18:48:17 arvados.keep[11112] DEBUG: {u'qr1hi-bi6l4-ebw6id3jibfc3cm': OrderedDict([(u'href', u'/keep_services/qr1hi-bi6l4-ebw6id3jibfc3cm'), (u'kind', u'arvados#keepService'), (u'etag', u'dlywp1t0nne9dw3gds9tcyhs3'), (u'uuid', u'qr1hi-bi6l4-ebw6id3jibfc3cm'), (u'owner_uuid', u'qr1hi-tpzed-327pde5hktairf9'), (u'created_at', u'2014-07-08T02:19:44.015581000Z'), (u'modified_by_client_uuid', u'qr1hi-ozdt8-obw7foaks3qjyej'), (u'modified_by_user_uuid', u'qr1hi-tpzed-327pde5hktairf9'), (u'modified_at', u'2014-07-08T02:20:11.712252000Z'), (u'service_host', u'keep.qr1hi.arvadosapi.com'), (u'service_port', 443), (u'service_ssl_flag', True), (u'service_type', u'proxy'), (u'read_only', False), ('_service_root', 'https://keep.qr1hi.arvadosapi.com:443/')])}
2015-07-31 18:48:17 arvados.keep[11112] DEBUG: [OrderedDict([(u'href', u'/keep_services/qr1hi-bi6l4-ebw6id3jibfc3cm'), (u'kind', u'arvados#keepService'), (u'etag', u'dlywp1t0nne9dw3gds9tcyhs3'), (u'uuid', u'qr1hi-bi6l4-ebw6id3jibfc3cm'), (u'owner_uuid', u'qr1hi-tpzed-327pde5hktairf9'), (u'created_at', u'2014-07-08T02:19:44.015581000Z'), (u'modified_by_client_uuid', u'qr1hi-ozdt8-obw7foaks3qjyej'), (u'modified_by_user_uuid', u'qr1hi-tpzed-327pde5hktairf9'), (u'modified_at', u'2014-07-08T02:20:11.712252000Z'), (u'service_host', u'keep.qr1hi.arvadosapi.com'), (u'service_port', 443), (u'service_ssl_flag', True), (u'service_type', u'proxy'), (u'read_only', False), ('_service_root', 'https://keep.qr1hi.arvadosapi.com:443/')])]
2015-07-31 18:48:17 arvados.keep[11112] DEBUG: 22e4cbeae1375cb0aea342ab68ddbfef+67108864+A…: ['https://keep.qr1hi.arvadosapi.com:443/']
2015-07-31 18:48:17 arvados.keep[11112] INFO: No more Keep services to try; giving up
Traceback (most recent call last):
  File "/home/brett/pyvenv/bin/arv-copy", line 4, in <module>
    __import__('pkg_resources').run_script('arvados-python-client==0.1', 'arv-copy')
  File "/home/brett/pyvenv/local/lib/python2.7/site-packages/pkg_resources/__init__.py", line 735, in run_script
    self.require(requires)[0].run_script(script_name, ns)
  File "/home/brett/pyvenv/local/lib/python2.7/site-packages/pkg_resources/__init__.py", line 1652, in run_script
    exec(code, namespace, namespace)
  File "/home/brett/pyvenv/lib/python2.7/site-packages/arvados_python_client-0.1-py2.7.egg/EGG-INFO/scripts/arv-copy", line 4, in <module>
    main()
  File "/home/brett/pyvenv/local/lib/python2.7/site-packages/arvados_python_client-0.1-py2.7.egg/arvados/commands/arv_copy.py", line 119, in main
    args)
  File "/home/brett/pyvenv/local/lib/python2.7/site-packages/arvados_python_client-0.1-py2.7.egg/arvados/commands/arv_copy.py", line 563, in copy_collection
    data = src_keep.get(word)
  File "/home/brett/pyvenv/local/lib/python2.7/site-packages/arvados_python_client-0.1-py2.7.egg/arvados/retry.py", line 154, in num_retries_setter
    return orig_func(self, *args, **kwargs)
  File "/home/brett/pyvenv/local/lib/python2.7/site-packages/arvados_python_client-0.1-py2.7.egg/arvados/keep.py", line 907, in get
    "failed to read {}".format(loc_s), service_errors, label="service")
arvados.errors.KeepReadError: failed to read 22e4cbeae1375cb0aea342ab68ddbfef+67108864+A…: service https://keep.qr1hi.arvadosapi.com:443/ responded with 0 (23, 'Failed writing body (0 != 16384)')

I note that the KeepWriterThread ids keep going up monotonically. Is it possible they're not being released and eventually garbage collected?


Subtasks 4 (0 open4 closed)

Task #6908: Plug leakResolvedTom Clegg07/31/2015Actions
Task #6905: InvestigateResolvedTom Clegg07/31/2015Actions
Task #6906: Reproduce leakage in a test caseClosedTom Clegg07/31/2015Actions
Task #6903: Review 6844-py-mem-leakResolvedRadhika Chippada07/31/2015Actions
Actions

Also available in: Atom PDF