Bug #4266

[Crunch] crunch-dispatch should process logs efficiently, particularly while throwing away excessive log data.

Added by Tom Clegg about 5 years ago. Updated about 5 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
Crunch
Target version:
Start date:
10/20/2014
Due date:
% Done:

100%

Estimated time:
(Total: 3.00 h)
Story points:
0.5

Description

Example, while tossing rate-limited logs:
  • 29768 root      20   0  432m 241m 6656 R  74.8  1.6 946:59.33 ruby                                                                  
    
For comparison, arv-put is receiving the same logs and doing fine. (It seems to be greedy about memory, though.)
  • 16756 crunch    20   0 1447m 1.2g 4472 S   0.3  8.4   0:34.07 arv-put
    

Subtasks

Task #4272: Review 4266-log-performanceResolvedTom Clegg

Task #4267: Optimize ruby codeResolvedTom Clegg

Associated revisions

Revision 06686c47
Added by Tom Clegg about 5 years ago

Merge branch '4266-log-performance' refs #4266

History

#1 Updated by Tom Clegg about 5 years ago

  • Status changed from New to In Progress

#2 Updated by Tom Clegg about 5 years ago

  • Description updated (diff)
  • Category set to Crunch
  • Assigned To set to Tom Clegg
Clues:

#3 Updated by Tom Clegg about 5 years ago

4266-log-performance @ 2e76e6a

Laptop tests show crunch-dispatch.rb accepting ~200MiB/s using 80% CPU while throttle is closed, while yes(1) uses 20% CPU sending 100-byte lines. Much better than 100% and 3% respectively.

#4 Updated by Brett Smith about 5 years ago

Reviewing 2e76e6a. I mostly have questions.

  • There's a lot of code cleanup here, which provides little improvements like avoiding duplicate method calls and conditional checks, but it looks like most of the performance improvement comes from increasing the maximum read buffer size. Is that about right? (Thank you for commenting that so clearly.)
  • Why is this branch removing crunch_limit_log_events_per_job?
  • As a suggestion in the spirit of your code cleanup, the loop in read_pipes that reads lines from streambuf has a number of redundant checks. I think it could be both simplified and optimized along these lines:
    bufend = ''
    streambuf.each_line do |line|
      if not line.end_with? "\n" 
        bufend = line
        break
      # rate_limit returns true or false as to whether to actually log
      # the line or not.  It also modifies "line" in place to replace
      # it with an error if a logging limit is tripped.
      elsif rate_limit j, line
        $stderr.print "#{job_uuid} ! " unless line.index(job_uuid)
        $stderr.puts line
        pub_msg = "#{Time.now.ctime.to_s} #{line.strip}\n" 
        j[:stderr_buf_to_flush] << pub_msg
      end
    end
    streambuf.replace bufend
    

Thanks.

#5 Updated by Tom Clegg about 5 years ago

Brett Smith wrote:

Reviewing 2e76e6a. I mostly have questions.

  • There's a lot of code cleanup here, which provides little improvements like avoiding duplicate method calls and conditional checks, but it looks like most of the performance improvement comes from increasing the maximum read buffer size. Is that about right? (Thank you for commenting that so clearly.)

There were a bunch of little things (like having one flag saying "not logging now"). I think the other big improvement (~3x?) was the "short circuit": skipping the "split buffer into N lines and call rate_limit() and write_log(j) N times" stuff during intervals when we're refusing to output anything anyway.

I ended up making a variety of readability and bug fixes too though.

  • Why is this branch removing crunch_limit_log_events_per_job?

I noticed a bug (which had been fixed before but reintroduced when read_pipes got refactored) that hitting crunch_limit_log_events_per_job would actually result in the silenced logs being buffered in RAM. While trying to identify a scenario where this could happen, I noticed that the relationship between the old events_per_job limit and the new bytes_per_job, bytes_per_interval, and lines_per_interval limits is hard to pin down, and unpredictable at best. I figured (and Ward didn't disagree) the new limits alone give enough control, and are easy to understand.

  • As a suggestion in the spirit of your code cleanup, the loop in read_pipes that reads lines from streambuf has a number of redundant checks. I think it could be both simplified and optimized along these lines:
    [...]

Yes, that's easier to follow, thanks. Done.

Also noticed & fixed an OOM opportunity (no newline → infinite buffering).

Now at 6d3a888 de79b73

#6 Updated by Brett Smith about 5 years ago

Reviewing de79b73

Tom Clegg wrote:

Brett Smith wrote:

  • There's a lot of code cleanup here, which provides little improvements like avoiding duplicate method calls and conditional checks, but it looks like most of the performance improvement comes from increasing the maximum read buffer size. Is that about right? (Thank you for commenting that so clearly.)

There were a bunch of little things (like having one flag saying "not logging now"). I think the other big improvement (~3x?) was the "short circuit": skipping the "split buffer into N lines and call rate_limit() and write_log(j) N times" stuff during intervals when we're refusing to output anything anyway.

Thanks. Looking at the short-circuit condition again, is it possible we'll double-count some buf bytes as skipped? It looks to me like if the throttle is closed, the "each buffer" loop will read a new buf, tally its bytes as skipped under if not j[:log_throttle_is_open], then append it to streambuf. Anything in there that isn't line-delimited will stay in streambuf, where it will be counted again under if not j[:log_throttle_is_open] next time. Am I missing something?

It feels like there should be a next under if not j[:log_throttle_is_open], although that might not actually be correct.

  • As a suggestion in the spirit of your code cleanup, the loop in read_pipes that reads lines from streambuf has a number of redundant checks. I think it could be both simplified and optimized along these lines:
    [...]

Yes, that's easier to follow, thanks. Done.

Sure. I wouldn't bring this up if the point of this branch wasn't optimization, but according to the docs, lines is just short for each_line.to_a. Why add the overhead of building an array that we only enumerate? If you want do specify the line separator, you can do that with each_line too.

Also noticed & fixed an OOM opportunity (no newline → infinite buffering).

There's a small typo at the end of the comments you added here: "some indication that it [sic] incomplete."

#7 Updated by Tom Clegg about 5 years ago

Brett Smith wrote:

Thanks. Looking at the short-circuit condition again, is it possible we'll double-count some buf bytes as skipped? It looks to me like if the throttle is closed, the "each buffer" loop will read a new buf, tally its bytes as skipped under if not j[:log_throttle_is_open], then append it to streambuf. Anything in there that isn't line-delimited will stay in streambuf, where it will be counted again under if not j[:log_throttle_is_open] next time. Am I missing something?

It feels like there should be a next under if not j[:log_throttle_is_open], although that might not actually be correct.

You're right, there should be a "next" there. Not only does that fix the accounting, it actually gives the kind of performance I was hoping for when I put it there: now crunch-dispatch uses about 1.5× as much CPU as yes in my test case (was 30× before this branch).

Sure. I wouldn't bring this up if the point of this branch wasn't optimization, but according to the docs, lines is just short for each_line.to_a. Why add the overhead of building an array that we only enumerate? If you want do specify the line separator, you can do that with each_line too.

Oops, I just missed copying that from your example. Fixed now. Looping with each_line is certainly better than constructing an array. I also used $/ instead of "\n" in the following line, since that's what each_line uses by default and is, in fact, what we mean by "\n". (Right?)

There's a small typo at the end of the comments you added here: "some indication that it [sic] incomplete."

Fixed.

Now at 6ad9adc

#8 Updated by Brett Smith about 5 years ago

Tom Clegg wrote:

Looping with each_line is certainly better than constructing an array. I also used $/ instead of "\n" in the following line, since that's what each_line uses by default and is, in fact, what we mean by "\n". (Right?)

Yeah, that makes sense to me. 6ad9adc1 is good to merge, thanks.

#9 Updated by Tom Clegg about 5 years ago

  • Status changed from In Progress to Resolved

Also available in: Atom PDF