Project

General

Profile

Actions

Bug #7228

closed

[Crunch] crunch-dispatch should not create tmp files that break API server

Added by Tom Clegg over 9 years ago. Updated over 9 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
Crunch
Target version:
Story points:
0.5

Description

Problem

We have seen crunch-dispatch break API server as follows:
  • Run as root, as described in docs
  • Call some part of the API server's code base that uses Rails.cache
  • Create files and directories in {Rails.root}/tmp/cache with owner=root and permissions that prohibit www-data from writing

After this has happened, API server (running as www-data) crashes when trying to update cached data.

This isn't very common because API server usually creates/updates a given cache item before crunch-dispatch does. But when it does happen, it's bad: for example, new groups can't be created because the group cache can't be updated.

The condition can be fixed temporarily by running arvados-api-server-upgrade.sh (it does chown -R on the tmp dir, among other things). However, this doesn't prevent it from happening again.

The real solution is #5162: refactor crunch-dispatch as an API client so it can't touch the API server Rails project at all.

Ideas

In the meantime, there might be an effective workaround, like running crunch-dispatch with umask=002 and the same GID as the API server process.

(Running crunch-dispatch with the same UID as the API server process would fix the cache permission issue, but at the cost of introducing other problems: crunch-dispatch needs to use sudo, and giving www-data passwordless sudo undermines the security benefit of running the web service as non-root in the first place.)

Immediate fix

  • arvados-api-server-upgrade.sh already makes $WWW_OWNER the owner of tmp/ recursively. Extend it to chmod tmp/cache/ 2775.
  • Extend crunch-dispatch to run with a 002 umask. The only other file it opens is its own lockfile, and it sets a specific 0644 mode for that, so this should only affect Rails cache files.
  • Test using the procedure in note-5.
  • Make sure the arvados-dev branch gets merged before the arvados branch, so we build a new package that includes both the new upgrade script and the new crunch-dispatch.

Subtasks 2 (0 open2 closed)

Task #7345: Review arvados branch 7228-crunch-dispatch-umaskResolvedPeter Amstutz09/07/2015Actions
Task #7374: Review 7228-group-writable-tmp-cacheResolvedPeter Amstutz09/07/2015Actions

Related issues 1 (0 open1 closed)

Related to Arvados - Bug #5162: [Crunch] crunch-dispatch should use the API instead of running in the API server's rails environment.ClosedActions
Actions #1

Updated by Brett Smith over 9 years ago

  • Category set to Crunch

Tom Clegg wrote:

In the meantime, there might be an effective workaround, like running crunch-dispatch with umask=002 and the same GID as the API server process.

This seems to have some promise. Cache files currently in production have mode 0644, where www-data's umask is 022. It looks like the Rails cache code is not imposing a mode stricter than the umask.

(Running crunch-dispatch with the same UID as the API server process would fix the cache permission issue, but at the cost of introducing other problems: crunch-dispatch needs to use sudo, and giving www-data passwordless sudo undermines the security benefit of running the web service as non-root in the first place.)

What if we took the other approach, and ran both as the crunch user? That's easy to rig up in Nginx: chown crunch: /var/www/arvados-api/current/config.ru.

Actions #2

Updated by Brett Smith over 9 years ago

  • Target version set to Arvados Future Sprints
Actions #3

Updated by Brett Smith over 9 years ago

Brett Smith wrote:

Tom Clegg wrote:

In the meantime, there might be an effective workaround, like running crunch-dispatch with umask=002 and the same GID as the API server process.

This seems to have some promise. Cache files currently in production have mode 0644, where www-data's umask is 022. It looks like the Rails cache code is not imposing a mode stricter than the umask.

I dug into this to verify it. Here's Rails' code to write a cache file. Here's the atomic_write method called there. atomic_write specifically uses the default ownership and permissions of files in that directory. So setting tmp/cache setgid and umask 002 should be sufficient to fix the issue.

For what it's worth: the tracebacks we get from this refer to groups_for_user files. These are caches specifically created in the User model, calling methods on Rails cache. So yet another option might be to set up a different cache (another FileStore backed by a different directory?) when we're running crunch-dispatch, although I'm less sure how to do that nicely.

Actions #4

Updated by Brett Smith over 9 years ago

  • Target version changed from Arvados Future Sprints to 2015-09-30 sprint
Actions #5

Updated by Brett Smith over 9 years ago

Manual test procedure:

  • Start an API server without crunch-dispatch. Put at least one job in its queue.
  • Stop the API server.
  • Empty Rails' tmp/cache directory.
  • Start crunch-dispatch.
  • Confirm that cache files created by crunch-dispatch are writable by the Web server user (www-data on Debian).
Actions #6

Updated by Brett Smith over 9 years ago

  • Story points set to 0.5
Actions #7

Updated by Brett Smith over 9 years ago

  • Assigned To set to Brett Smith
Actions #8

Updated by Brett Smith over 9 years ago

  • Assigned To changed from Brett Smith to Peter Amstutz
Actions #9

Updated by Brett Smith over 9 years ago

  • Description updated (diff)
Actions #10

Updated by Nico César over 9 years ago

on arvados dev reviewing b3badf7..e48701c only 1 line change in

jenkins/arvados-api-server-extras/arvados-api-server-upgrade.sh

LGTM!

Actions #11

Updated by Peter Amstutz over 9 years ago

Test process:

  1. Temporarily add "User.first.group_permissions" to #run method in crunch-dispatch to cause a cache to be written (I couldn't figure out the conditions that cause crunch-dispatch to create cache files consistently on startup)
  2. Run sudo -u root -g peter crunch-dispatch.rb
  3. This causes a groups_for_user file to be created.
  4. Without umask:
    -rw-r--r-- 1 root peter  530 Sep 23 12:40 groups_for_user_bseta-tpzed-000000000000000
    

    API server breaks because it tries to delete groups_for_user_bseta-tpzed-000000000000000 and it can't
  5. With umask:
    -rw-rw-r-- 1 root peter  531 Sep 23 12:41 groups_for_user_bseta-tpzed-000000000000000
    

    API server works because it is able to delete groups_for_user_bseta-tpzed-000000000000000
Actions #12

Updated by Peter Amstutz over 9 years ago

Branch is 7228-crunch-dispatch-umask

Actions #13

Updated by Nico César over 9 years ago

reviewed 27cb821..4a1fdbc

LGTM

Actions #14

Updated by Peter Amstutz over 9 years ago

  • Status changed from New to Resolved

Applied in changeset arvados|commit:acd1241cec9260d54c2dca55785e309644334c41.

Actions

Also available in: Atom PDF