Project

General

Profile

Actions

Feature #7328

closed

[Crunch] [UX] Standard excepthook to help debug job problems

Added by Bryan Cosca over 8 years ago. Updated about 1 year ago.

Status:
Rejected
Priority:
Normal
Assigned To:
-
Category:
Crunch
Target version:
-
Story points:
1.0
Release:
Release relationship:
Auto

Description

Original report

for example, a simple "ls" in the directory that they are currently in or looking into would help. Working with collections is a new concept, so users can easily run into file not found issues. Also, if they are piping commands and lose track half way, it would be great to get more insight into what is being created/deleted/etc.

Implementation

Overview: Add an excepthook function to the Python SDK that saves the contents of the $TASK_WORK directory as the task's output, and mark it failed. The SDK automatically installs this function as sys.excepthook when it's running under a job that has a debugging runtime constraint set.

  • Add a new function to the arvados.crunch Python SDK module.
    • It will be installed as sys.excepthook and must take the same signature: (exc_type, exc_value, traceback)
    • It creates a new collection from the contents of the $TASK_WORK directory, and updates arvados.current_task() so that the task's output is the manifest of that collection, and success is false.
      • run-command already has most of the code to do this.
    • No matter what happens in the above step(s), the last thing it does is always call sys.__excepthook__ to get the usual exception handling behavior. Ensure this happens using a try/finally block.
  • Add this code or the functional equivalent to __init__.py:
    if os.environ.get('TASK_UUID'):
        try:
            _want_debughook = current_job()['runtime_constraints']['debug_exceptions']
        except (errors.ApiError, KeyError):
            # The job doesn't exist or doesn't define the constraint.
            pass
        else:
            if _want_debughook:
                sys.excepthook = arvados.crunch.your_new_function
            del _want_debughook
    
  • Add documentation for the new runtime constraint to the Job schema page.
Actions

Also available in: Atom PDF