Project

General

Profile

Actions

Bug #8998

closed

[API] Memory overflow when dumping a 25TB collection as JSON

Added by Peter Grandi about 8 years ago. Updated almost 8 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
-
Target version:
Story points:
2.0

Description

When uploading a collection of nearly 25TiB in a bit over 3,000 files with arv-put the outcome was:

librarian@rockall$ arv-put --replication 1 --no-resume --project-uuid gcam1-j7d0g-k25rlhe6ig8p9na --name DDD_WGS_EGAD00001001114 DDDP*
25956153M / 25956153M 100.0%
arv-put: Error creating Collection on project: <HttpError 422 when requesting https://gcam1.example.com/arvados/v1/collections?ensure_unique_name=true&alt=json returned "#<NoMemoryError: failed to allocate memory>">.
Traceback (most recent call last):
 File "/usr/local/bin/arv-put", line 4, in <module>
   main()
 File "/usr/local/lib/python2.7/dist-packages/arvados/commands/put.py", line 533, in main
   stdout.write(output)
UnboundLocalError: local variable 'output' referenced before assignment

From logs:

Oh... fiddlesticks.

An error occurred when Workbench sent a request to the Arvados API server. Try reloading this page. If the problem is temporary, your request might go through next time. If that doesn't work, the information below can help system administrators track down the problem.

API request URL
    https://gcam1.camdc.genomicsplc.com/arvados/v1/collections/gcam1-4zz18-i4nlpovriwdxu6j
API response

    {
      ":errors":[
        "#<NoMemoryError: failed to allocate memory>" 
      ],
      ":error_token":"1457687649+b12feaf3" 
    }

and I have attached the longer backtrace from a previous log.

A 25TB upload should result in a 15MB manifest, large but it should not overflow the API server that has 4GiB of memory.

Anyhow we can allocate more GiB of memory, but it would be nice to have a guideline as to how many are needed in relationship to largest collection size.

Perhaps 25TB collections are too large, especially considering the resulting manifest size, and my understanding that any access to a file in a collection results in the latency of a download of the full manifest.

But I have been told that we have a requirement for arbitrary naming conventions, where it is not acceptable to split large sets of data (many small files or fewer large files) into separate collections, like "data-subset-1", "data-subset-2", "data-subset-3", ... solely because of storage system limitations.


Files

160311_arvOutOfMemBt.txt (11.1 KB) 160311_arvOutOfMemBt.txt Peter Grandi, 04/15/2016 09:13 AM
ojram.rb (525 Bytes) ojram.rb Brett Smith, 04/15/2016 03:58 PM
yajlram.rb (525 Bytes) yajlram.rb Brett Smith, 04/15/2016 07:35 PM

Subtasks 3 (0 open3 closed)

Task #9094: TestingResolvedPeter Amstutz04/15/2016Actions
Task #9093: Review 8998-optimize-decode-www-form-componentResolvedTom Clegg04/15/2016Actions
Task #9092: Update JSON library usageClosedPeter Amstutz04/15/2016Actions
Actions

Also available in: Atom PDF