Bug #2934
closedLimit impact of chatty crunch job on websocket/Log mechanism
Description
- 64K Log entries per job
- 64Mbytes of log data per job
When one of these limits is reached, add one more entry like "Server configured limit reached (crunch_limit_log_events_per_job: 65536). Subsequent logs truncated"
.
Subsequent logs will still get stored in Keep, but won't be available through event bus.
These limits (and the current LOG_BUFFER_SIZE
constant) should be configurable. Add to services/api/config/application.default.yml
in the common
section:
crunch_limit_log_events_per_job: 65536 crunch_limit_log_event_bytes_per_job: 67108864 crunch_limit_log_bytes_per_event: 4096
Then use Rails.configuration.crunch_limit_log_bytes_per_event
instead of LOG_BUFFER_SIZE
, etc.
Optional: Add crunch_limit_seconds_between_log_events
(this is a minimum, and currently hard coded at 1, but we can probably support 0, 0.5, 5, etc., easily enough).
Updated by Tim Pierce over 10 years ago
As of d7e23d9, crunch-dispatch.rb keeps event counts and byte counts in memory instead of reading them from disk.
Per discussion on IRC: we will almost certainly need to tune and tweak these parameters to find good settings. The ones that are defined in the story should be considered a starting point and not definitive. They're all configurable parameters.
We could implement byte limits per event rather than across all events (i.e. limit to 1Kb per event rather than 64MB total for all logged events), but I don't see the benefit. Applying the limit as a cap on total bytes generated gives jobs the flexibility to generate a few log events that are longer than the average without truncating them, and still prevents jobs from running away and filling the entire disk.
Updated by Peter Amstutz over 10 years ago
- Suggest changing the read size on read_nonblock() in #read_pipes to Rails.configuration.crunch_log_bytes_per_event instead of hardcoded at 2**20, otherwise if something sufficiently spammy could write out (a lot) more than Rails.configuration.crunch_log_bytes_per_event bytes in a single event.
Updated by Tim Pierce over 10 years ago
Peter Amstutz wrote:
- Suggest changing the read size on read_nonblock() in #read_pipes to Rails.configuration.crunch_log_bytes_per_event instead of hardcoded at 2**20, otherwise if something sufficiently spammy could write out (a lot) more than Rails.configuration.crunch_log_bytes_per_event bytes in a single event.
crunch_log_bytes_per_event isn't an upper limit, it's a lower one (which is why I didn't add "limit" to the name) -- it's the lower threshold for flushing the event to the Log table, if events are getting created more often than once per second. So we expect and allow that some events will actually produce more than crunch_log_bytes_per_event bytes at a time. (I'm absolutely open to changing the name: crunch_log_min_bytes_per_event? crunch_log_byte_threshold_for_fast_events? I dunno.)
Updated by Peter Amstutz over 10 years ago
- If it's been at least N seconds (where currently N=1) then flush the buffer
- If there are at least M bytes (where M is from the configuration file) then flush the buffer
- There's also some line buffering going on, so I think it does generally force events to break evenly across lines.
My suggestion is simply that if you only read M bytes at a time, then you'll produce records that are at most (2*M-1) bytes instead of the current 1 MiB read size which means you could get a 1 MiB record, which seems really unwieldy and unfriendly to downstream log consumers. However, I'm inclined to let this go, and see how it performs in testing.
Updated by Tim Pierce over 10 years ago
OK. I agree that sounds like a good idea, but would just as soon confirm whether it really is a problem first. (It's Been Working Okay So Far [tm])
Changes at 46ab868 per our offline discussion to add a new test fixture repository to test/test.git.tar rather than mucking with the repos that are already there. This still requires a small change to commits_controller_test.rb
to account for the fact that there's a whole new repository in there.
Updated by Peter Amstutz over 10 years ago
Not a fatal error but still disconcerting:
[ 40/196] Arvados::V1::CommitsControllerTest#test_test_find_commit_rangefatal: bad object 31ce37fe365b3dc204300a3e4c396ad333ed0556 fatal: bad object 077ba2ad3ea24a929091a9e6ce545c93199b8e57 fatal: ambiguous argument 'tag1': unknown revision or path not in the working tree. Use '--' to separate paths from revisions, like this: 'git <command> [<revision>...] -- [<file>...]' fatal: bad object 077ba2ad3ea24a929091a9e6ce545c93199b8e57 fatal: bad object 077ba2ad3ea24a929091a9e6ce545c93199b8e57 fatal: bad object 077ba2ad3ea24a929091a9e6ce545c93199b8e57 fatal: bad object 077ba2ad3ea24a929091a9e6ce545c93199b8e57
Updated by Tim Pierce over 10 years ago
Peter Amstutz wrote:
Not a fatal error but still disconcerting:
[...]
Yes, these are expected -- the tests are attempting to look up the specific revision 077ba2ad3ea24a929091a9e6ce545c93199b8e57 on every repository readable to users(:active), and now there are additional test fixture repositories readable to :active that do not include that commit. It's definitely disconcerting. Added documentation in 0143d26 to explain this output.
Updated by Tim Pierce over 10 years ago
- Status changed from In Progress to Resolved
- % Done changed from 0 to 100
Applied in changeset arvados|commit:95811b3a901df3ceab701fe18e9ec1bb614bef90.