Bug #7181

[Crunch] crunch-job installer script should handle egg_info errors caused because Git is not installed

Added by Tom Clegg over 5 years ago. Updated over 5 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
Crunch
Target version:
Start date:
09/01/2015
Due date:
% Done:

100%

Estimated time:
(Total: 0.00 h)
Story points:
0.5

Description

Problem:
  • If git is not installed on a compute node, the "egg_info" part of crunch-job's "install" script fails; fails to detect that the problem was that "git" was missing, and therefore fails to fall back to writing its own setup.cfg; and fails to provide any diagnostic info.

Current code:

  open(my $egg_info_pipe, "-|",
       "python2.7 \Q$python_dir/setup.py\E --quiet egg_info 2>&1 >/dev/null");
  my @egg_info_errors = <$egg_info_pipe>;
  close($egg_info_pipe);
  if ($?) {
    if (@egg_info_errors and ($egg_info_errors[-1] =~ /\bgit\b/)) {

Fix:
  • Remove --quiet flag
  • Extend the $egg_info_errors condition at the bottom of this block to catch the case where Git is not installed, and go into the following block to write a build tag to setup.cfg. (To figure out what this stderr looks like, take a .tar.gz of the Python SDK, and run python setup.py egg_info from it on a host that does not have Git installed.)

Subtasks

Task #7347: Review 7181-crunch-missing-gitResolvedPeter Amstutz

Associated revisions

Revision 99349abd
Added by Peter Amstutz over 5 years ago

Merge branch '7181-crunch-missing-git' closes #7181

History

#1 Updated by Brett Smith over 5 years ago

I'm always for improving the logging, but the main point of this code is to make sure that the SDK can be installed even if it has been extracted directly from a Git repository. Git should not be necessary to do this. I think this code should also check for this condition (either with can_run("git") or by introspecting the error message), and running its current "set a build tag" branch when it exists.

#2 Updated by Tom Clegg over 5 years ago

To clarify: The apparent intent of the existing code ("use git if it's installed, but don't require git") is correct. The idea here is just to fix the following two bugs in the implementation:
  1. Git is required. We never wanted git to be required, but it is, because the "if there is no git..." code path is impossible to reach, because >/dev/null ensures $egg_info_errors[-1] is always undef, which never matches /\bgit\b/. I expect removing >/dev/null will make the /\bgit\b/ regexp work, and thereby make it possible to install the SDK even though git is not installed on the compute node.
  2. If the setup.py command fails and the "if there is no git..." code path doesn't get invoked, the (presumably) non-git-related error is hard to diagnose. Even when we delete >/dev/null and stop throwing away the setup.py output completely, the --quiet flag will remove some information that is likely to be helpful. The --quiet flag would seem reasonable if it avoided clutter in normal logs, but it doesn't: if setup.py succeeds, the entire output is suppressed, quiet or not. So the question is "when setup.py fails in a way that causes the whole job to fail, do we want to see shorter setup.py logs, or longer ones?" and if the answer is "longer" then we should remove the --quiet flag.

The can_run("git") approach sounds even better than correcting the existing introspection -- assuming "can't run git" is the only git error we need to work around this way.

#3 Updated by Brett Smith over 5 years ago

Tom Clegg wrote:

To clarify: The apparent intent of the existing code ("use git if it's installed, but don't require git") is correct.

The can_run("git") approach sounds even better than correcting the existing introspection -- assuming "can't run git" is the only git error we need to work around this way.

Sorry, that's not quite what the current code is meant to do.

The Python SDK's setup.py says, "If a build tag is not set, generate one from Git." Even when Git is installed on the compute node (as it is on our clusters), this fails because the Python SDK was package with git archive, and no .git directory exists. The intent of this code is, "If setup.py fails because it couldn't generate a build tag from Git, set one ourselves."

Right now it properly handles the case where setup.py fails because .git is missing. I'm proposing that it be extended to handle the case where setup.py fails because Git is not installed.

The changes to make setup.py less quiet may help with that. It may be easier to introspect the error after they're done.

(Before you ask: right now, I'm not aware of any way that we could get to the start of this code and have a build tag already set. I was trying to be future-proof…)

#4 Updated by Brett Smith over 5 years ago

Also,

  1. Git is required. We never wanted git to be required, but it is, because the "if there is no git..." code path is impossible to reach, because >/dev/null ensures $egg_info_errors[-1] is always undef, which never matches /\bgit\b/.

Sorry, this isn't quite right. The full chain of redirects is 2>&1 >/dev/null. This means @egg_info_errors has whatever the process writes to stderr. The match works for the ".git does not exist" case described above.

#5 Updated by Brett Smith over 5 years ago

  • Target version changed from Arvados Future Sprints to 2015-09-30 sprint

#6 Updated by Brett Smith over 5 years ago

  • Description updated (diff)

#7 Updated by Brett Smith over 5 years ago

  • Assigned To set to Brett Smith

#8 Updated by Brett Smith over 5 years ago

  • Assigned To changed from Brett Smith to Peter Amstutz

#9 Updated by Brett Smith over 5 years ago

  • Subject changed from [Crunch] Fix error reporting for setup.py for arvados_sdk_version to [Crunch] crunch-job installer script should handle egg_info errors caused because Git is not installed

#10 Updated by Peter Amstutz over 5 years ago

python2.7 $python_dir/setup.py egg_info 2>&1 >/dev/null

I had to spend some quality time with the bash docs to understand the shell redirection going on here, so for the record:

2>&1 means "modify file descriptor 2 so that it writes to the same pipe as file description 1"
>/dev/null means "modify file descriptor 1 to write to /dev/null"

So the effect is that writes to FD 2 (stderr) go to the original stdout, and writes to FD 1 (stdout) go to /dev/null.

These are processed in sequence, so >/dev/null 2>&1 would result in all output going to /dev/null.

#11 Updated by Peter Amstutz over 5 years ago

7181-crunch-missing-git is ready for review.

Testing procedure:

  1. Make a deliberately broken "x-git-broken" branch of Arvados which has "xgit" instead of "git" in gittaggers (because selectively breaking git in the install script is tricky because other parts of the install rely on git)
  2. Make a pipeline with arvados_sdk_version set to "x-git-broken"
  3. Check that crunch-job is able to install the arvados sdk for both "master" and "x-git-broken" (without the fix: "x-git-broken" will fail)

#12 Updated by Nico César over 5 years ago

I look at the code in fc5005a..5e8edef.

The testing procedure is a good one, but is also incredible painful. My opinion is to merge this.

#13 Updated by Peter Amstutz over 5 years ago

  • Status changed from New to Resolved

Applied in changeset arvados|commit:99349abd0ee7347b5bac3d4a9638853c6d4b97ab.

Also available in: Atom PDF