Project

General

Profile

Actions

Bug #13803

closed

Big manifest produces NoMemoryError on API server

Added by Lucas Di Pentima almost 6 years ago. Updated almost 6 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
API
Target version:
Story points:
-
Release:
Release relationship:
Auto

Description

Manifests with certain characteristics (lots of files/streams) produce NoMemoryError on API server even though the available RAM is not exhausted on the host.

One way to reproduce it is running collections_performance_test.rb modifying the make_manifest() call to:

 make_manifest(streams: 10000,
                    files_per_stream: 100,
                    blocks_per_file: 1,
                    bytes_per_block: 1,
                    api_token: api_token(:active))

The command to run this test:

~/arvados$ WORKSPACE=$(pwd) ./build/run-tests.sh --temp $HOME/tmp --only services/api 'services/api_test=TESTOPTS=-n=/.*crud.cycle.*/'

Subtasks 1 (0 open1 closed)

Task #13824: Review 13803-oj-gem-malloc-bugClosedLucas Di Pentima07/17/2018Actions
Actions #1

Updated by Lucas Di Pentima almost 6 years ago

  • Description updated (diff)
Actions #2

Updated by Lucas Di Pentima almost 6 years ago

The issue seems to be dependent on the manifest's size, without regard of its structure.

The following tests were run on a Virtualbox VM with 4GB RAM. No RAM exhaustion was observed during the test runs.

streams files/stream blocks/file bytes/block manifest MiB success? notes
100 10000 1 1 100 no SafeJSON.dump() immediately failed with NoMemoryError
100 100 120 1 98 no SafeJSON.dump() immediately failed with NoMemoryError
500000 1 2 1 95 no SafeJSON.dump() immediately failed with NoMemoryError
300000 1 3 1 82 no CollectionsApiPerformanceTest#test_crud_cycle_for_a_collection_with_a_big_manifest failed because of a 422
1 1 1000000 1 82 no CollectionsApiPerformanceTest#test_crud_cycle_for_a_collection_with_a_big_manifest failed because of a 422
100 100 100 1 82 no CollectionsApiPerformanceTest#test_crud_cycle_for_a_collection_with_a_big_manifest failed because of a 422
100 7500 1 1 75 no CollectionsApiPerformanceTest#test_crud_cycle_for_a_collection_with_a_big_manifest failed because of a 422
100 7187 1 1 72 no CollectionsApiPerformanceTest#test_crud_cycle_for_a_collection_with_a_big_manifest failed because of a 422
1 687500 1 1 71 no CollectionsApiPerformanceTest#test_crud_cycle_for_a_collection_with_a_big_manifest failed because of a 422
100 7031 1 1 70 no CollectionsApiPerformanceTest#test_crud_cycle_for_a_collection_with_a_big_manifest failed because of a 422
100 6953 1 1 70 no CollectionsApiPerformanceTest#test_crud_cycle_for_a_collection_with_a_big_manifest failed because of a 422
100 6875 1 1 69 yes
100 100 80 1 65 yes
100 6250 1 1 62 yes
300000 1 2 1 57 yes
500000 1 1 1 54 yes
200000 1 3 1 54 yes
1 500000 1 1 52 yes
100 5000 1 1 50 yes
1 1 500000 1 41 yes
100 1000 1 1 9 yes
1000 100 1 1 9 yes
Actions #3

Updated by Lucas Di Pentima almost 6 years ago

Definitely it's Oj.dump() fault.

With the VM w/4 GB RAM & oj gem versions 2.18.5 versus 3.6.4:

json = Oj.dump({"data" => "1234567890" * 1024*1024*100})

With the one we're using (2.18.5), I get the NoMemoryError: failed to allocate memory error, with the newer one, I can ask 10 times the size and still having extra RAM.
The odd thing is that oj 2.18.5 requests a large amount of memory but never uses it.

API server's dependency on Oj is blocked by arvados-cli gem, that requires ~> 2.0 on its .gemspec file.

Actions #4

Updated by Lucas Di Pentima almost 6 years ago

Updates at 355173ba2 - branch 13803-oj-gem-malloc-bug
Test run: https://ci.curoverse.com/job/developer-run-tests/813/

  • Removed API server's dependency on arvados-cli
  • Updated Oj dependency on API server, workbench & arvados-cli to latest (3.6.4)
  • Updated Oj JSON mimicking by removing oj_mimic_json gem & adding an initializer
  • Updated time encoding precision format to keep using nanoseconds
  • Fixed SafeJSON.load() to return nil when input is nil or empty string because of a behavior change on Oj gem that produced tests failures
Actions #5

Updated by Tom Clegg almost 6 years ago

LGTM, thanks!

Actions #6

Updated by Lucas Di Pentima almost 6 years ago

  • Status changed from In Progress to Resolved
Actions #7

Updated by Tom Morris almost 6 years ago

  • Release set to 13
Actions

Also available in: Atom PDF