Bug #13803
closed
Big manifest produces NoMemoryError on API server
Added by Lucas Di Pentima over 6 years ago.
Updated over 6 years ago.
Release relationship:
Auto
Description
Manifests with certain characteristics (lots of files/streams) produce NoMemoryError
on API server even though the available RAM is not exhausted on the host.
One way to reproduce it is running collections_performance_test.rb
modifying the make_manifest() call to:
make_manifest(streams: 10000,
files_per_stream: 100,
blocks_per_file: 1,
bytes_per_block: 1,
api_token: api_token(:active))
The command to run this test:
~/arvados$ WORKSPACE=$(pwd) ./build/run-tests.sh --temp $HOME/tmp --only services/api 'services/api_test=TESTOPTS=-n=/.*crud.cycle.*/'
- Description updated (diff)
The issue seems to be dependent on the manifest's size, without regard of its structure.
The following tests were run on a Virtualbox VM with 4GB RAM. No RAM exhaustion was observed during the test runs.
streams |
files/stream |
blocks/file |
bytes/block |
manifest MiB |
success? |
notes |
100 |
10000 |
1 |
1 |
100 |
no |
SafeJSON.dump() immediately failed with NoMemoryError |
100 |
100 |
120 |
1 |
98 |
no |
SafeJSON.dump() immediately failed with NoMemoryError |
500000 |
1 |
2 |
1 |
95 |
no |
SafeJSON.dump() immediately failed with NoMemoryError |
300000 |
1 |
3 |
1 |
82 |
no |
CollectionsApiPerformanceTest#test_crud_cycle_for_a_collection_with_a_big_manifest failed because of a 422 |
1 |
1 |
1000000 |
1 |
82 |
no |
CollectionsApiPerformanceTest#test_crud_cycle_for_a_collection_with_a_big_manifest failed because of a 422 |
100 |
100 |
100 |
1 |
82 |
no |
CollectionsApiPerformanceTest#test_crud_cycle_for_a_collection_with_a_big_manifest failed because of a 422 |
100 |
7500 |
1 |
1 |
75 |
no |
CollectionsApiPerformanceTest#test_crud_cycle_for_a_collection_with_a_big_manifest failed because of a 422 |
100 |
7187 |
1 |
1 |
72 |
no |
CollectionsApiPerformanceTest#test_crud_cycle_for_a_collection_with_a_big_manifest failed because of a 422 |
1 |
687500 |
1 |
1 |
71 |
no |
CollectionsApiPerformanceTest#test_crud_cycle_for_a_collection_with_a_big_manifest failed because of a 422 |
100 |
7031 |
1 |
1 |
70 |
no |
CollectionsApiPerformanceTest#test_crud_cycle_for_a_collection_with_a_big_manifest failed because of a 422 |
100 |
6953 |
1 |
1 |
70 |
no |
CollectionsApiPerformanceTest#test_crud_cycle_for_a_collection_with_a_big_manifest failed because of a 422 |
100 |
6875 |
1 |
1 |
69 |
yes |
|
100 |
100 |
80 |
1 |
65 |
yes |
|
100 |
6250 |
1 |
1 |
62 |
yes |
|
300000 |
1 |
2 |
1 |
57 |
yes |
|
500000 |
1 |
1 |
1 |
54 |
yes |
|
200000 |
1 |
3 |
1 |
54 |
yes |
|
1 |
500000 |
1 |
1 |
52 |
yes |
|
100 |
5000 |
1 |
1 |
50 |
yes |
|
1 |
1 |
500000 |
1 |
41 |
yes |
|
100 |
1000 |
1 |
1 |
9 |
yes |
|
1000 |
100 |
1 |
1 |
9 |
yes |
|
Definitely it's Oj.dump()
fault.
With the VM w/4 GB RAM & oj gem versions 2.18.5 versus 3.6.4:
json = Oj.dump({"data" => "1234567890" * 1024*1024*100})
With the one we're using (2.18.5), I get the NoMemoryError: failed to allocate memory
error, with the newer one, I can ask 10 times the size and still having extra RAM.
The odd thing is that oj 2.18.5 requests a large amount of memory but never uses it.
API server's dependency on Oj
is blocked by arvados-cli
gem, that requires ~> 2.0
on its .gemspec
file.
Updates at 355173ba2 - branch 13803-oj-gem-malloc-bug
Test run: https://ci.curoverse.com/job/developer-run-tests/813/
- Removed API server's dependency on arvados-cli
- Updated Oj dependency on API server, workbench & arvados-cli to latest (3.6.4)
- Updated Oj JSON mimicking by removing
oj_mimic_json
gem & adding an initializer
- Updated time encoding precision format to keep using nanoseconds
- Fixed
SafeJSON.load()
to return nil when input is nil or empty string because of a behavior change on Oj gem that produced tests failures
- Status changed from In Progress to Resolved
Also available in: Atom
PDF