Bug #13803

Big manifest produces NoMemoryError on API server

Added by Lucas Di Pentima about 1 year ago. Updated about 1 year ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
API
Target version:
Start date:
07/17/2018
Due date:
% Done:

100%

Estimated time:
(Total: 0.00 h)
Story points:
-
Release:
Release relationship:
Auto

Description

Manifests with certain characteristics (lots of files/streams) produce NoMemoryError on API server even though the available RAM is not exhausted on the host.

One way to reproduce it is running collections_performance_test.rb modifying the make_manifest() call to:

 make_manifest(streams: 10000,
                    files_per_stream: 100,
                    blocks_per_file: 1,
                    bytes_per_block: 1,
                    api_token: api_token(:active))

The command to run this test:

~/arvados$ WORKSPACE=$(pwd) ./build/run-tests.sh --temp $HOME/tmp --only services/api 'services/api_test=TESTOPTS=-n=/.*crud.cycle.*/'

Subtasks

Task #13824: Review 13803-oj-gem-malloc-bugClosedLucas Di Pentima

Associated revisions

Revision 55137e68
Added by Lucas Di Pentima about 1 year ago

Merge branch '13803-oj-gem-malloc-bug'
Closes #13803

Arvados-DCO-1.1-Signed-off-by: Lucas Di Pentima <>

History

#1 Updated by Lucas Di Pentima about 1 year ago

  • Description updated (diff)

#2 Updated by Lucas Di Pentima about 1 year ago

The issue seems to be dependent on the manifest's size, without regard of its structure.

The following tests were run on a Virtualbox VM with 4GB RAM. No RAM exhaustion was observed during the test runs.

streams files/stream blocks/file bytes/block manifest MiB success? notes
100 10000 1 1 100 no SafeJSON.dump() immediately failed with NoMemoryError
100 100 120 1 98 no SafeJSON.dump() immediately failed with NoMemoryError
500000 1 2 1 95 no SafeJSON.dump() immediately failed with NoMemoryError
300000 1 3 1 82 no CollectionsApiPerformanceTest#test_crud_cycle_for_a_collection_with_a_big_manifest failed because of a 422
1 1 1000000 1 82 no CollectionsApiPerformanceTest#test_crud_cycle_for_a_collection_with_a_big_manifest failed because of a 422
100 100 100 1 82 no CollectionsApiPerformanceTest#test_crud_cycle_for_a_collection_with_a_big_manifest failed because of a 422
100 7500 1 1 75 no CollectionsApiPerformanceTest#test_crud_cycle_for_a_collection_with_a_big_manifest failed because of a 422
100 7187 1 1 72 no CollectionsApiPerformanceTest#test_crud_cycle_for_a_collection_with_a_big_manifest failed because of a 422
1 687500 1 1 71 no CollectionsApiPerformanceTest#test_crud_cycle_for_a_collection_with_a_big_manifest failed because of a 422
100 7031 1 1 70 no CollectionsApiPerformanceTest#test_crud_cycle_for_a_collection_with_a_big_manifest failed because of a 422
100 6953 1 1 70 no CollectionsApiPerformanceTest#test_crud_cycle_for_a_collection_with_a_big_manifest failed because of a 422
100 6875 1 1 69 yes
100 100 80 1 65 yes
100 6250 1 1 62 yes
300000 1 2 1 57 yes
500000 1 1 1 54 yes
200000 1 3 1 54 yes
1 500000 1 1 52 yes
100 5000 1 1 50 yes
1 1 500000 1 41 yes
100 1000 1 1 9 yes
1000 100 1 1 9 yes

#3 Updated by Lucas Di Pentima about 1 year ago

Definitely it's Oj.dump() fault.

With the VM w/4 GB RAM & oj gem versions 2.18.5 versus 3.6.4:

json = Oj.dump({"data" => "1234567890" * 1024*1024*100})

With the one we're using (2.18.5), I get the NoMemoryError: failed to allocate memory error, with the newer one, I can ask 10 times the size and still having extra RAM.
The odd thing is that oj 2.18.5 requests a large amount of memory but never uses it.

API server's dependency on Oj is blocked by arvados-cli gem, that requires ~> 2.0 on its .gemspec file.

#4 Updated by Lucas Di Pentima about 1 year ago

Updates at 355173ba2 - branch 13803-oj-gem-malloc-bug
Test run: https://ci.curoverse.com/job/developer-run-tests/813/

  • Removed API server's dependency on arvados-cli
  • Updated Oj dependency on API server, workbench & arvados-cli to latest (3.6.4)
  • Updated Oj JSON mimicking by removing oj_mimic_json gem & adding an initializer
  • Updated time encoding precision format to keep using nanoseconds
  • Fixed SafeJSON.load() to return nil when input is nil or empty string because of a behavior change on Oj gem that produced tests failures

#5 Updated by Tom Clegg about 1 year ago

LGTM, thanks!

#6 Updated by Lucas Di Pentima about 1 year ago

  • Status changed from In Progress to Resolved

#7 Updated by Tom Morris about 1 year ago

  • Release set to 13

Also available in: Atom PDF