Task #2325

2221-complete-docker code review

Added by Tim Pierce over 7 years ago. Updated over 7 years ago.

Status:
Closed
Priority:
Normal
Assigned To:
Tim Pierce
Category:
-
Start date:
03/14/2014
Due date:
% Done:

100%

Estimated time:
2.00 h

Related issues

Blocks Arvados - Feature #2487: Finish/merge last sprint's docker work.Resolved

History

#1 Updated by Tom Clegg over 7 years ago

Notes about crunch-job changes

  • In collate_output, don't you have to close $child_in in order for "arv keep put" to know it has read all of its input? fixed in arvados|3690225e29161191cc5daabe4a335a0ec5737d3d
  • L1074: print $child_in "XXX fetch_block($output) failed XXX\n"; ← this should probably be sent to Log() instead of getting dumped into the output manifest. (Even though that's been happening so far.)
  • L1082: $s->can_read(5); should be longer than 5s. Maybe 120s before giving up?
  • L1082: Log some sort of "timed out" error message if that happens.

#2 Updated by Anonymous over 7 years ago

  • Status changed from New to Resolved
  • Start date set to 03/14/2014
  • % Done changed from 0 to 100
  • Remaining (hours) changed from 2.0 to 0.0

Applied in changeset arvados|commit:cf8b27749cf44aee74914622dc8bc2a9690204dc.

#3 Updated by Tim Pierce over 7 years ago

  • Status changed from Resolved to In Progress

[oops, should not have closed this until code review is actually complete]

#4 Updated by Tom Clegg over 7 years ago

  • Target version changed from 2014-03-26 Debt service and dev painkillers to 2014-04-16 Dev tools and data/resource management

#5 Updated by Tom Clegg over 7 years ago

  • Target version changed from 2014-04-16 Dev tools and data/resource management to 2014-03-26 Debt service and dev painkillers
  • Remaining (hours) changed from 0.0 to 2.0

#6 Updated by Brett Smith over 7 years ago

  • Assigned To changed from Ward Vandewege to Tim Pierce

I just pushed 2221-complete-docker-brett. It includes a few small bugfixes, mostly aimed at ensuring a more consistent environment in the Docker images, accounting for more differences in the local environment. With these changes, I can follow the README to get the complete set of Docker images. Which is fantastic.

I like the changes to crunch-job; this is a little more comfortable for mere mortal Perl hackers, I think. Is it feasible to expressly close the filehandles we get from open2? I think this is a case where that could actually matter, depending on how much we end up collating and how often we call fetch_block. It would also be helpful to use \Q…\E (a.k.a. quotemeta) on variable arguments in `system calls`, to help prevent exploits.

Other than that, I just have a couple of minor style points. Near the bottom of build/config.rb there's a comparison against nil; I think the nil? method is preferable. There's also a couple of whitespace bugs; try git diff --check master... for a report.

Thanks again.

#7 Updated by Tim Pierce over 7 years ago

  • Assigned To changed from Tim Pierce to Brett Smith

PTAL, comments below:

Brett Smith wrote:

I just pushed 2221-complete-docker-brett. It includes a few small bugfixes, mostly aimed at ensuring a more consistent environment in the Docker images, accounting for more differences in the local environment. With these changes, I can follow the README to get the complete set of Docker images. Which is fantastic.

Nice changes, I've merged them back into 2221-complete-docker along with my changes for your comments here.

I like the changes to crunch-job; this is a little more comfortable for mere mortal Perl hackers, I think. Is it feasible to expressly close the filehandles we get from open2? I think this is a case where that could actually matter, depending on how much we end up collating and how often we call fetch_block. It would also be helpful to use \Q…\E (a.k.a. quotemeta) on variable arguments in `system calls`, to help prevent exploits.

I guess we can explicitly close the filehandles, but what's the win? Won't the kernel reclaim them anyway when the process terminates?

Re quotemeta: excellent suggestion, I've made this change for the code that I touched in this branch. More generally crunch-job's shell escapes should be more thoroughly audited, probably in issue #2350 (which I have for this sprint).

Other than that, I just have a couple of minor style points. Near the bottom of build/config.rb there's a comparison against nil; I think the nil? method is preferable. There's also a couple of whitespace bugs; try git diff --check master... for a report.

Works for me, added. I also simplified the permission changes in build_tools/config.rb per our discussion on IRC.

#8 Updated by Brett Smith over 7 years ago

  • Assigned To changed from Brett Smith to Tim Pierce

I guess we can explicitly close the filehandles, but what's the win? Won't the kernel reclaim them anyway when the process terminates?

Yes, but is it possible that we'll hit the limit before then? It's only 1024 by default, so I'm concerned that, especially on a job with a lot of subtasks where we call fetch_block a lot, we could use them all up before we finish the work.

Per discussion on IRC, I suggested but did not implement the style changes, so go ahead with those. Thanks again.

#9 Updated by Tim Pierce over 7 years ago

  • Assigned To changed from Tim Pierce to Brett Smith
  • Remaining (hours) changed from 2.0 to 0.5

Sorry for the misunderstanding. Addressed the style issues in commit bb511c0.

commit b1b9ffbb replaces the open2() call in fetch_block() with a simple open(). In collate_output(), we're already closing the input filehandle as soon as we know we're done with it, and terminating the process as soon as we're done collecting output, so I don't think there's a lot more optimization to do productively there. (Also, I sincerely hope that we'll replace this code with a native Keep SDK before we ever get close to running out of filehandles :-)

Brett Smith wrote:

I guess we can explicitly close the filehandles, but what's the win? Won't the kernel reclaim them anyway when the process terminates?

Yes, but is it possible that we'll hit the limit before then? It's only 1024 by default, so I'm concerned that, especially on a job with a lot of subtasks where we call fetch_block a lot, we could use them all up before we finish the work.

Per discussion on IRC, I suggested but did not implement the style changes, so go ahead with those. Thanks again.

#10 Updated by Brett Smith over 7 years ago

  • Assigned To changed from Brett Smith to Tim Pierce

The new open() call in fetch_blocks is missing its semicolon at the end of the line.

With that bugfix, I think it's good to merge. Thanks very much; I know we're both looking forward to building on this base. :)

#11 Updated by Tim Pierce over 7 years ago

Brett Smith wrote:

The new open() call in fetch_blocks is missing its semicolon at the end of the line.

With that bugfix, I think it's good to merge. Thanks very much; I know we're both looking forward to building on this base. :)

Embarrassing. Thanks, fixed.

#12 Updated by Tim Pierce over 7 years ago

  • Status changed from In Progress to Resolved
  • Remaining (hours) changed from 0.5 to 0.0

Applied in changeset arvados|commit:6d73acc845a2dd7413fdde0742473d83bf3d0719.

#13 Updated by Tim Pierce over 7 years ago

  • Status changed from Resolved to Closed
  • Target version changed from 2014-03-26 Debt service and dev painkillers to 2014-04-16 Dev tools and data/resource management

Also available in: Atom PDF