Feature #4561: [SDKs] Refactor run-command so it can be used as an SDK by scripts in a git tree - Arvados

Feature #4561

Updated by Tom Clegg almost 10 years ago

Currently you have two main options for running a job: 
 # Put the program you need in a docker image. Use run-command from the arvados tree to wrap it as a crunch job. 
 # Write a native crunch script in your git tree. 

 The first option forces you to save a new docker image in order to run a new version of your program. Unacceptable! 

 There are several features of run-command that make it convenient and attractive to beginners. However, it forces you to use a totally different approach to developing and running scripts, an approach which prevents you from doing some important things: 
 * Keeping your code in revision control, 
 * Using the same code in more than one pipeline template, 
 * Writing jobs in the programming language of your choice. 

 There is no migration path from a simple run-command job to a non-trivial program, so the developer is forced to choose: live with run-command's custom JSON-based programming language, or abandon the existing pipelines and all of run-command's advantages, and rewrite everything in a normal language like Python. 

 This can be addressed by refactoring run-command as a set of utilities and features, rather than a programming language that can only be used inside crunch jobs. 
 * The "run-command language" interpreter JSON programming language should be runnable like other interpreters (@#!whatever-the-language-is-to-be-called@). presented as an interpreter. 
 * Convenience features like "store output dir contents in Keep and set success=true at end of task" should be ported to other SDKs too (most obviously bash), so authors can migrate from the JSON language to a normal language. 
 * It should be possible to provide a crunch script in _any_ language by copying the script itself into the job record. (This makes it possible to run jobs without touching git.) 
 ** The whole point of the run-command language is to already be JSON-encoded, which means it should be provided in a serialized attribute like script_parameters. (Other languages are just text, but that can also fit in a serialized field.) 
 ** The name of the "script" attribute already suggests that you can put a script in it. This could change type (from varchar(255) to text) and become a serialized field capable of containing a string or a hash. This means we can't rely on "script" to be a short name suitable for displaying in UI (a problem we already have with run-command jobs: the program has no name, so we display the name of the language instead). 
 ** We could support passing the name of an interpreter in "script" (e.g., "run-command" or "python") and passing the program itself in @script_parameters[stdin]@. We would have to treat string and hash/array cases differently: if stdin is a string, pass the string value, but if stdin is a hash or array, pass its JSON encoding. 
 * 

 It should be possible to move access run-command's features from a script that lives in your run-command program into a git tree and run it from there. tree. 
 ** * Currently, this can be done awkwardly by copying some version of run-command into your own git tree. 
 ** * With #4027, we can make run-command's features available through the SDK. 
 ** * Then we just need @#!whatever-the-language-is-called@ to work, or some other a way to invoke run-command from the installed SDK, package rather than requiring it to be in @$CRUNCH_SRC/crunch_scripts/@.

Back

Project

General

Profile

Arvados

Feature #4561