Bug #13803
closedBig manifest produces NoMemoryError on API server
Description
Manifests with certain characteristics (lots of files/streams) produce NoMemoryError
on API server even though the available RAM is not exhausted on the host.
One way to reproduce it is running collections_performance_test.rb
modifying the make_manifest() call to:
make_manifest(streams: 10000, files_per_stream: 100, blocks_per_file: 1, bytes_per_block: 1, api_token: api_token(:active))
The command to run this test:
~/arvados$ WORKSPACE=$(pwd) ./build/run-tests.sh --temp $HOME/tmp --only services/api 'services/api_test=TESTOPTS=-n=/.*crud.cycle.*/'
Updated by Lucas Di Pentima over 6 years ago
The issue seems to be dependent on the manifest's size, without regard of its structure.
The following tests were run on a Virtualbox VM with 4GB RAM. No RAM exhaustion was observed during the test runs.
streams | files/stream | blocks/file | bytes/block | manifest MiB | success? | notes |
100 | 10000 | 1 | 1 | 100 | no | SafeJSON.dump() immediately failed with NoMemoryError |
100 | 100 | 120 | 1 | 98 | no | SafeJSON.dump() immediately failed with NoMemoryError |
500000 | 1 | 2 | 1 | 95 | no | SafeJSON.dump() immediately failed with NoMemoryError |
300000 | 1 | 3 | 1 | 82 | no | CollectionsApiPerformanceTest#test_crud_cycle_for_a_collection_with_a_big_manifest failed because of a 422 |
1 | 1 | 1000000 | 1 | 82 | no | CollectionsApiPerformanceTest#test_crud_cycle_for_a_collection_with_a_big_manifest failed because of a 422 |
100 | 100 | 100 | 1 | 82 | no | CollectionsApiPerformanceTest#test_crud_cycle_for_a_collection_with_a_big_manifest failed because of a 422 |
100 | 7500 | 1 | 1 | 75 | no | CollectionsApiPerformanceTest#test_crud_cycle_for_a_collection_with_a_big_manifest failed because of a 422 |
100 | 7187 | 1 | 1 | 72 | no | CollectionsApiPerformanceTest#test_crud_cycle_for_a_collection_with_a_big_manifest failed because of a 422 |
1 | 687500 | 1 | 1 | 71 | no | CollectionsApiPerformanceTest#test_crud_cycle_for_a_collection_with_a_big_manifest failed because of a 422 |
100 | 7031 | 1 | 1 | 70 | no | CollectionsApiPerformanceTest#test_crud_cycle_for_a_collection_with_a_big_manifest failed because of a 422 |
100 | 6953 | 1 | 1 | 70 | no | CollectionsApiPerformanceTest#test_crud_cycle_for_a_collection_with_a_big_manifest failed because of a 422 |
100 | 6875 | 1 | 1 | 69 | yes | |
100 | 100 | 80 | 1 | 65 | yes | |
100 | 6250 | 1 | 1 | 62 | yes | |
300000 | 1 | 2 | 1 | 57 | yes | |
500000 | 1 | 1 | 1 | 54 | yes | |
200000 | 1 | 3 | 1 | 54 | yes | |
1 | 500000 | 1 | 1 | 52 | yes | |
100 | 5000 | 1 | 1 | 50 | yes | |
1 | 1 | 500000 | 1 | 41 | yes | |
100 | 1000 | 1 | 1 | 9 | yes | |
1000 | 100 | 1 | 1 | 9 | yes |
Updated by Lucas Di Pentima over 6 years ago
Definitely it's Oj.dump()
fault.
With the VM w/4 GB RAM & oj gem versions 2.18.5 versus 3.6.4:
json = Oj.dump({"data" => "1234567890" * 1024*1024*100})
With the one we're using (2.18.5), I get the NoMemoryError: failed to allocate memory
error, with the newer one, I can ask 10 times the size and still having extra RAM.
The odd thing is that oj 2.18.5 requests a large amount of memory but never uses it.
API server's dependency on Oj
is blocked by arvados-cli
gem, that requires ~> 2.0
on its .gemspec
file.
Updated by Lucas Di Pentima over 6 years ago
Updates at 355173ba2 - branch 13803-oj-gem-malloc-bug
Test run: https://ci.curoverse.com/job/developer-run-tests/813/
- Removed API server's dependency on arvados-cli
- Updated Oj dependency on API server, workbench & arvados-cli to latest (3.6.4)
- Updated Oj JSON mimicking by removing
oj_mimic_json
gem & adding an initializer - Updated time encoding precision format to keep using nanoseconds
- Fixed
SafeJSON.load()
to return nil when input is nil or empty string because of a behavior change on Oj gem that produced tests failures
Updated by Lucas Di Pentima over 6 years ago
- Status changed from In Progress to Resolved
Applied in changeset arvados|55137e6828bf11f76c3f9ec61e4a76954f5d6fa1.