https://dev.arvados.org/https://dev.arvados.org/favicon.ico?15576888422014-08-26T15:44:01ZArvadosArvados - Idea #3699: [SDKs] Copy a pipeline instance, along with its input and output data, from one arvados instance to anotherhttps://dev.arvados.org/issues/3699?journal_id=141202014-08-26T15:44:01ZWard Vandewegeward@curii.com
<ul><li><strong>Story points</strong> set to <i>2.0</i></li></ul> Arvados - Idea #3699: [SDKs] Copy a pipeline instance, along with its input and output data, from one arvados instance to anotherhttps://dev.arvados.org/issues/3699?journal_id=141902014-08-27T14:06:50ZWard Vandewegeward@curii.com
<ul><li><strong>Target version</strong> changed from <i>Arvados Future Sprints</i> to <i>2014-09-17 sprint</i></li></ul> Arvados - Idea #3699: [SDKs] Copy a pipeline instance, along with its input and output data, from one arvados instance to anotherhttps://dev.arvados.org/issues/3699?journal_id=142832014-08-27T15:57:36ZPeter Amstutzpeter.amstutz@curii.com
<ul><li><strong>Description</strong> updated (<a title="View differences" href="/journals/14283/diff?detail_id=12969">diff</a>)</li></ul> Arvados - Idea #3699: [SDKs] Copy a pipeline instance, along with its input and output data, from one arvados instance to anotherhttps://dev.arvados.org/issues/3699?journal_id=142842014-08-27T16:09:33ZPeter Amstutzpeter.amstutz@curii.com
<ul><li><strong>Story points</strong> changed from <i>2.0</i> to <i>3.0</i></li></ul> Arvados - Idea #3699: [SDKs] Copy a pipeline instance, along with its input and output data, from one arvados instance to anotherhttps://dev.arvados.org/issues/3699?journal_id=142862014-08-27T16:12:40ZPeter Amstutzpeter.amstutz@curii.com
<ul><li><strong>Description</strong> updated (<a title="View differences" href="/journals/14286/diff?detail_id=12973">diff</a>)</li></ul> Arvados - Idea #3699: [SDKs] Copy a pipeline instance, along with its input and output data, from one arvados instance to anotherhttps://dev.arvados.org/issues/3699?journal_id=142992014-08-27T16:20:15ZTim Piercetwp@curoverse.com
<ul><li><strong>Assigned To</strong> set to <i>Tim Pierce</i></li></ul> Arvados - Idea #3699: [SDKs] Copy a pipeline instance, along with its input and output data, from one arvados instance to anotherhttps://dev.arvados.org/issues/3699?journal_id=143002014-08-27T16:21:45ZTim Piercetwp@curoverse.com
<ul><li><strong>Subject</strong> changed from <i>[SDKs] Copy objects from one arvados instance to another</i> to <i>[SDKs] A pipeline from one Arvados instance can be run on another instance</i></li></ul> Arvados - Idea #3699: [SDKs] Copy a pipeline instance, along with its input and output data, from one arvados instance to anotherhttps://dev.arvados.org/issues/3699?journal_id=143012014-08-27T16:22:29ZTom Cleggtom@curii.com
<ul><li><strong>Subject</strong> changed from <i>[SDKs] A pipeline from one Arvados instance can be run on another instance</i> to <i>[SDKs] Copy a pipeline instance, along with its input and output data, from one arvados instance to another</i></li></ul> Arvados - Idea #3699: [SDKs] Copy a pipeline instance, along with its input and output data, from one arvados instance to anotherhttps://dev.arvados.org/issues/3699?journal_id=143022014-08-27T16:23:15ZTom Cleggtom@curii.com
<ul><li><strong>Description</strong> updated (<a title="View differences" href="/journals/14302/diff?detail_id=12989">diff</a>)</li></ul> Arvados - Idea #3699: [SDKs] Copy a pipeline instance, along with its input and output data, from one arvados instance to anotherhttps://dev.arvados.org/issues/3699?journal_id=143772014-08-28T14:53:47ZTim Piercetwp@curoverse.com
<ul><li><strong>Description</strong> updated (<a title="View differences" href="/journals/14377/diff?detail_id=13086">diff</a>)</li><li><strong>Category</strong> set to <i>SDKs</i></li></ul> Arvados - Idea #3699: [SDKs] Copy a pipeline instance, along with its input and output data, from one arvados instance to anotherhttps://dev.arvados.org/issues/3699?journal_id=144752014-08-29T10:25:53ZTim Piercetwp@curoverse.com
<ul><li><strong>Description</strong> updated (<a title="View differences" href="/journals/14475/diff?detail_id=13119">diff</a>)</li></ul> Arvados - Idea #3699: [SDKs] Copy a pipeline instance, along with its input and output data, from one arvados instance to anotherhttps://dev.arvados.org/issues/3699?journal_id=144762014-08-29T10:33:27ZTim Piercetwp@curoverse.com
<ul><li><strong>Description</strong> updated (<a title="View differences" href="/journals/14476/diff?detail_id=13120">diff</a>)</li></ul> Arvados - Idea #3699: [SDKs] Copy a pipeline instance, along with its input and output data, from one arvados instance to anotherhttps://dev.arvados.org/issues/3699?journal_id=144792014-08-29T10:43:00ZTim Piercetwp@curoverse.com
<ul><li><strong>Description</strong> updated (<a title="View differences" href="/journals/14479/diff?detail_id=13127">diff</a>)</li></ul> Arvados - Idea #3699: [SDKs] Copy a pipeline instance, along with its input and output data, from one arvados instance to anotherhttps://dev.arvados.org/issues/3699?journal_id=144812014-08-29T11:59:44ZTim Piercetwp@curoverse.com
<ul><li><strong>Description</strong> updated (<a title="View differences" href="/journals/14481/diff?detail_id=13128">diff</a>)</li></ul> Arvados - Idea #3699: [SDKs] Copy a pipeline instance, along with its input and output data, from one arvados instance to anotherhttps://dev.arvados.org/issues/3699?journal_id=145092014-08-29T17:42:23ZTim Piercetwp@curoverse.com
<ul><li><strong>Description</strong> updated (<a title="View differences" href="/journals/14509/diff?detail_id=13163">diff</a>)</li></ul> Arvados - Idea #3699: [SDKs] Copy a pipeline instance, along with its input and output data, from one arvados instance to anotherhttps://dev.arvados.org/issues/3699?journal_id=145692014-09-03T14:38:13ZPeter Amstutzpeter.amstutz@curii.com
<ul></ul><p>In-progress review <a class="changeset" title="3699: support pipeline templates arv-copy can work on pipeline templates." href="https://dev.arvados.org/projects/arvados/repository/arvados/revisions/1f3035cfe645741753e45ff8d3cf43a3fc5b2385">1f3035c</a></p>
<ol>
<li>Consider implementing part of the 'arvados.command' module (similarly to arv-put) and/or putting as much functionality into the SDK as possible.</li>
<li>Especially would like to see api_for_instance() in the SDK </li>
<li>I think Tom said that the uuid type map is available in the discovery document?</li>
<li>Want a --project-uuid option to specify the destination project (i.e. owner_uuid). We could get clever and determine the source and destination instances based on the instance portion of the uuid.</li>
<li>Want to log each thing copied along with the new UUID on the destination system.</li>
</ol> Arvados - Idea #3699: [SDKs] Copy a pipeline instance, along with its input and output data, from one arvados instance to anotherhttps://dev.arvados.org/issues/3699?journal_id=145702014-09-03T14:52:09ZTim Piercetwp@curoverse.com
<ul></ul><p>Peter Amstutz wrote:</p>
<blockquote>
<p>In-progress review <a class="changeset" title="3699: support pipeline templates arv-copy can work on pipeline templates." href="https://dev.arvados.org/projects/arvados/repository/arvados/revisions/1f3035cfe645741753e45ff8d3cf43a3fc5b2385">1f3035c</a></p>
<ol>
<li>Consider implementing part of the 'arvados.command' module (similarly to arv-put) and/or putting as much functionality into the SDK as possible.</li>
<li>Especially would like to see api_for_instance() in the SDK</li>
</ol>
</blockquote>
<p>I'd like to do this too -- the immediate goal of course is to have arv-copy work on the command line, but to the extent I can structure this to be a wrapper around a sensible set of SDK functions, I will.</p>
<p>We could extend <code>arvados.api()</code> to take an "instance" or "config_file" argument -- e.g. <code>arvados.api('v1', instance='qr1hi')</code> would load configuration settings from <code>~/.config/arvados/qr1hi.conf</code>.</p>
<blockquote>
<ol>
<li>I think Tom said that the uuid type map is available in the discovery document?</li>
</ol>
</blockquote>
<p>He did say this. I did not find it there. I'll try again.</p>
<blockquote>
<ol>
<li>Want a --project-uuid option to specify the destination project (i.e. owner_uuid). We could get clever and determine the source and destination instances based on the instance portion of the uuid.</li>
</ol>
</blockquote>
<p>A --project-uuid option is a good idea, but even if we guess the instance from the uuid, we'll still need to find the appropriate API token for it. I'm not sure that buys us anything useful.</p>
<blockquote>
<ol>
<li>Want to log each thing copied along with the new UUID on the destination system.</li>
</ol>
</blockquote>
<p>Can do.</p> Arvados - Idea #3699: [SDKs] Copy a pipeline instance, along with its input and output data, from one arvados instance to anotherhttps://dev.arvados.org/issues/3699?journal_id=145712014-09-03T14:52:24ZTim Piercetwp@curoverse.com
<ul><li><strong>Status</strong> changed from <i>New</i> to <i>In Progress</i></li></ul> Arvados - Idea #3699: [SDKs] Copy a pipeline instance, along with its input and output data, from one arvados instance to anotherhttps://dev.arvados.org/issues/3699?journal_id=145722014-09-03T15:32:32ZTim Piercetwp@curoverse.com
<ul></ul><p>Tim Pierce wrote:</p>
<blockquote>
<p>Peter Amstutz wrote:</p>
<blockquote>
<ol>
<li>I think Tom said that the uuid type map is available in the discovery document?</li>
</ol>
</blockquote>
<p>He did say this. I did not find it there. I'll try again.</p>
</blockquote>
<p>Okay, found it:</p>
<pre>
apischema = src._schema.schemas
obj_class = [k for k in apischema if apischema[k].get('uuidPrefix') == 'j7d0g']
if obj_class:
return obj_class[0]
</pre> Arvados - Idea #3699: [SDKs] Copy a pipeline instance, along with its input and output data, from one arvados instance to anotherhttps://dev.arvados.org/issues/3699?journal_id=147552014-09-11T09:49:43ZPeter Amstutzpeter.amstutz@curii.com
<ul></ul><ol>
<li>Add 'copy' subcommand to 'arv' frontend.</li>
<li>Give a friendlier error message, preferably explaining that the user needs to create a file and directing the user to the "manage_account" page to get the credentials:<br /><pre>
$ arv-copy <a href="https://arvadosapi.com/4n8aq-d1hrv-51w0b47yd8hnt05">4n8aq-d1hrv-51w0b47yd8hnt05</a> 4n8aq 4xphq
Traceback (most recent call last):
File "/home/peter/work/arvados/sdk/cli/bin/arv-copy", line 4, in <module>
main()
File "/home/peter/work/arvados/sdk/python/arvados/commands/copy.py", line 64, in main
src_arv = api_for_instance(args.source_arvados)
File "/home/peter/work/arvados/sdk/python/arvados/commands/copy.py", line 95, in api_for_instance
cfg = arvados.config.load(config_file)
File "/home/peter/work/arvados/sdk/python/arvados/config.py", line 31, in load
with open(config_file, "r") as f:
IOError: [Errno 2] No such file or directory: '/home/peter/.config/arvados/4n8aq.conf'
</pre></li>
<li>We ought to allow users to paste this text as-is from the "manage_account" page (this confusion with settings.conf has been reported by actual users):<br /><pre>
HISTIGNORE=$HISTIGNORE:'export ARVADOS_API_TOKEN=*'
export ARVADOS_API_TOKEN=3h3xxxxxxxxxxxxxxxxxxxxxxxxxxr8jra3eabb
export ARVADOS_API_HOST=localhost:3001
export ARVADOS_API_HOST_INSECURE=true
</pre></li>
<li>There should probably be a way to specify an alternate search directory instead of only looking in '$HOME/.config/arvados' for the case of automated jobs that don't have a home directory.</li>
<li>This shouldn't fail: (trying to copy a pipeline instance)<br /><pre>
$ arv-copy <a href="https://arvadosapi.com/4n8aq-d1hrv-51w0b47yd8hnt05">4n8aq-d1hrv-51w0b47yd8hnt05</a> 4n8aq 4xphq
Traceback (most recent call last):
File "/home/peter/work/arvados/sdk/cli/bin/arv-copy", line 4, in <module>
main()
File "/home/peter/work/arvados/sdk/python/arvados/commands/copy.py", line 76, in main
src=src_arv, dst=dst_arv)
File "/home/peter/work/arvados/sdk/python/arvados/commands/copy.py", line 185, in copy_pipeline_instance
for dep in job['dependencies']:
TypeError: 'NoneType' object is not iterable
</pre></li>
<li>Next I tried copying a pipeline template. Did this succeed? It doesn't appear to have copied the collections or git repository.<br /><pre>
$ arv-copy <a href="https://arvadosapi.com/4n8aq-p5p6p-myx6p0vq84irkes">4n8aq-p5p6p-myx6p0vq84irkes</a> 4n8aq 4xphq
{u'kind': u'arvados#pipelineTemplate', u'uuid': u'<a href="https://arvadosapi.com/4xphq-p5p6p-l84js0szbtttuxk">4xphq-p5p6p-l84js0szbtttuxk</a>', u'modified_at': u'2014-09-11T13:27:06Z', u'created_at': u'2014-09-11T13:27:06Z', u'description': None, u'modified_by_client_uuid': u'<a href="https://arvadosapi.com/4xphq-ozdt8-7sfww9tghj44cc3">4xphq-ozdt8-7sfww9tghj44cc3</a>', u'owner_uuid': u'<a href="https://arvadosapi.com/4xphq-tpzed-d6gnynp5uioqnxo">4xphq-tpzed-d6gnynp5uioqnxo</a>', u'href': u'/pipeline_templates/4xphq-p5p6p-l84js0szbtttuxk', u'etag': u'110ka4zc55m2uch7qeapvu27h', u'components': {u'hasher2': {u'nondeterministic': True, u'repository': u'peter2', u'script': u'hash', u'script_parameters': {u'input': {u'output_of': u'hasher'}}, u'runtime_constraints': {}, u'output_name': u'funky hash man', u'script_version': u'fc45cbfa6fa0d33f7304b5c86a96449b52a68976'}, u'hasher': {u'nondeterministic': True, u'repository': u'peter2', u'script': u'hash', u'script_parameters': {u'input': u'1235f41348b10eaff7d622dba7bd4a9f+83'}, u'runtime_constraints': {}, u'output_name': False, u'script_version': u'fc45cbfa6fa0d33f7304b5c86a96449b52a68976'}}, u'modified_by_user_uuid': u'<a href="https://arvadosapi.com/4xphq-tpzed-d6gnynp5uioqnxo">4xphq-tpzed-d6gnynp5uioqnxo</a>', u'name': u'hash copy'}
</pre></li>
<li>Next I tried --recursive on the same pipeline template. You need to add a "ensure_unique_name=true" to the create() call.<br /><pre>
$ arv-copy --recursive <a href="https://arvadosapi.com/4n8aq-p5p6p-myx6p0vq84irkes">4n8aq-p5p6p-myx6p0vq84irkes</a> 4n8aq 4xphq
Traceback (most recent call last):
File "/home/peter/work/arvados/sdk/cli/bin/arv-copy", line 4, in <module>
main()
File "/home/peter/work/arvados/sdk/python/arvados/commands/copy.py", line 78, in main
result = copy_pipeline_template(args.object_uuid, src=src_arv, dst=dst_arv)
File "/home/peter/work/arvados/sdk/python/arvados/commands/copy.py", line 247, in copy_pipeline_template
return dst.pipeline_templates().create(body=old_pt).execute()
File "/usr/local/lib/python2.7/dist-packages/oauth2client/util.py", line 132, in positional_wrapper
return wrapped(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/apiclient/http.py", line 723, in execute
raise HttpError(resp, content, uri=self.uri)
arvados.errors.ApiError: <HttpError 422 when requesting https://4xphq.arvadosapi.com/arvados/v1/pipeline_templates?alt=json returned "#<PG::UniqueViolation: ERROR: duplicate key value violates unique constraint "pipeline_template_owner_uuid_name_unique"
DETAIL: Key (owner_uuid, name)=(<a href="https://arvadosapi.com/4xphq-tpzed-d6gnynp5uioqnxo">4xphq-tpzed-d6gnynp5uioqnxo</a>, hash copy) already exists.
>">
</pre></li>
<li>I tried copying a collection. Finally something works? Hard to tell in between the debug spew.<br /><pre>
$ arv-copy <a href="https://arvadosapi.com/4n8aq-4zz18-0qooqug1wpfu51o">4n8aq-4zz18-0qooqug1wpfu51o</a> 4n8aq 4xphq
2014-09-11 09:37:22 arvados.arv-copy[12460] DEBUG: copying block db937a55ffd607b7a2238220bed2b0c8+71+Af8e91662692c162d9ff18e126d9e679dd90003b1@54241a92
DEBUG:arvados.arv-copy:copying block db937a55ffd607b7a2238220bed2b0c8+71+Af8e91662692c162d9ff18e126d9e679dd90003b1@54241a92
2014-09-11 09:37:22 arvados.arv-copy[12460] INFO: Retrieved 71 bytes
INFO:arvados.arv-copy:Retrieved 71 bytes
2014-09-11 09:37:23 arvados.arv-copy[12460] DEBUG: saving <a href="https://arvadosapi.com/4n8aq-4zz18-0qooqug1wpfu51o">4n8aq-4zz18-0qooqug1wpfu51o</a> manifest: . db937a55ffd607b7a2238220bed2b0c8+71+Af8e91662692c162d9ff18e126d9e679dd90003b1@54241a92 0:71:md5sum.txt
DEBUG:arvados.arv-copy:saving <a href="https://arvadosapi.com/4n8aq-4zz18-0qooqug1wpfu51o">4n8aq-4zz18-0qooqug1wpfu51o</a> manifest: . db937a55ffd607b7a2238220bed2b0c8+71+Af8e91662692c162d9ff18e126d9e679dd90003b1@54241a92 0:71:md5sum.txt
{u'kind': u'arvados#collection', u'uuid': u'<a href="https://arvadosapi.com/4xphq-4zz18-le36ua0jiuggyq7">4xphq-4zz18-le36ua0jiuggyq7</a>', u'modified_at': u'2014-09-11T13:37:23Z', u'created_at': u'2014-09-11T13:37:23Z', u'description': None, u'modified_by_client_uuid': u'<a href="https://arvadosapi.com/4xphq-ozdt8-7sfww9tghj44cc3">4xphq-ozdt8-7sfww9tghj44cc3</a>', u'manifest_text': u'. db937a55ffd607b7a2238220bed2b0c8+71+A4ac65be7f7119422ccbd422c120cecf03b013fc2@54241a93 0:71:md5sum.txt\n', u'owner_uuid': u'<a href="https://arvadosapi.com/4xphq-tpzed-d6gnynp5uioqnxo">4xphq-tpzed-d6gnynp5uioqnxo</a>', u'properties': None, u'portable_data_hash': u'9c4ad41a0f62aeafdf95f4df222a251b+54', u'href': u'/collections/4xphq-4zz18-le36ua0jiuggyq7', u'etag': u'aj1zyfo65uf8w39g22a79aprt', u'modified_by_user_uuid': u'<a href="https://arvadosapi.com/4xphq-tpzed-d6gnynp5uioqnxo">4xphq-tpzed-d6gnynp5uioqnxo</a>', u'name': None}
</pre></li>
<li>On further inspection, the collection name doesn't appear to be copied. (remember to use "ensure_unique_name=true" to the create() call)</li>
</ol> Arvados - Idea #3699: [SDKs] Copy a pipeline instance, along with its input and output data, from one arvados instance to anotherhttps://dev.arvados.org/issues/3699?journal_id=148732014-09-15T14:51:00ZTim Piercetwp@curoverse.com
<ul></ul><p>New revision at <a class="changeset" title="3699: copy collection properties, name, etc. copy_collection now copies the original collection ..." href="https://dev.arvados.org/projects/arvados/repository/arvados/revisions/963c9e8f87f09ff798e23e0c9ddca6eb7bbec796">963c9e8</a>. There has been a lot of diff churn here, and merges from master, (mostly around making sure that the names of repos and collections are changed in consistent ways when copying instances recursively) so you will probably find it easiest to <code>diff master...HEAD</code> from scratch.</p>
Updates:
<ul>
<li>More careful about copying collections to dst, making sure not to re-fetch blocks if the collection exists at the destination</li>
<li>Copies all collections that match the regex for a collection UUID or hash, anywhere in the source object</li>
<li>Copying git repositories: pushes explicitly to a branch named for the source git URL (e.g. <code>git push dst git_git_qr1hi_arvadosapi_com_twp_git</code>).</li>
<li>More explicit about success or failure at the end</li>
<li>Less debug output</li>
</ul>
<p>The git commands still spit to stdout -- we can try to do something to make that quieter, but I found it useful since the git repo copying was one of the most annoying things to debug. Overall the verbosity of the command is diminished a lot.</p> Arvados - Idea #3699: [SDKs] Copy a pipeline instance, along with its input and output data, from one arvados instance to anotherhttps://dev.arvados.org/issues/3699?journal_id=149032014-09-16T10:43:37ZPeter Amstutzpeter.amstutz@curii.com
<ul></ul><p>1. Not addressed<br />2. Not addressed<br />3. Not addressed<br />4. Not addressed</p>
I'm copying from my local development instance, so the configuration is probably broken, but this error messages is totally <br />unhelpful in telling me what went wrong:<br /><pre>
2014-09-16 09:25:01 arvados.arv-copy[26729] DEBUG: src_git_url: git@git.4n8aq.arvadosapi.com:peter2.git
Traceback (most recent call last):
File "/home/peter/work/arvados/sdk/cli/bin/arv-copy", line 4, in <module>
main()
File "/home/peter/work/arvados/sdk/python/arvados/commands/copy.py", line 78, in main
recursive=args.recursive)
File "/home/peter/work/arvados/sdk/python/arvados/commands/copy.py", line 151, in copy_pipeline_instance
recursive=True)
File "/home/peter/work/arvados/sdk/python/arvados/commands/copy.py", line 199, in copy_pipeline_template
copy_git_repos(pt, src, dst, dst_git_repo)
File "/home/peter/work/arvados/sdk/python/arvados/commands/copy.py", line 249, in copy_git_repos
copy_git_repo(repo, src, dst, dst_repo)
File "/home/peter/work/arvados/sdk/python/arvados/commands/copy.py", line 348, in copy_git_repo
.format(dst_git_repo, r['items_available']))
Exception: cannot identify source repo None; 0 repos found
</pre>
<ul>
<li>On further investigation there is a copy-and-paste error, it says "source" in both Exceptions in copy_git_repos().</li>
<li>It appears that '--dst-git-repo' does not have a default value (such as choosing the first writable git repo in the destination list) so it is actually required on the command line, but this is not enforced.</li>
<li>With some tinkering, I was able to copy a pipeline instance successfully. However, while arv-copy correctly updated the 'repository' portion of the component, it did not update the 'script_version' to point to the appropriate branch.</li>
</ul>
Next, I tried to copy another pipeline with associated collections:<br /><pre>
$ arv-copy --dst-git-repo peter <a href="https://arvadosapi.com/4n8aq-d1hrv-0pxhk17pu24ac5j">4n8aq-d1hrv-0pxhk17pu24ac5j</a> 4n8aq 4xphq
2014-09-16 10:03:18 arvados.arv-copy[28644] DEBUG: copying block 752192751a8a72ae6ac8b0fdb58d37df+5000+Ab060805321f7358d04a045e537c1a8631d16e700@542ab826
DEBUG:arvados.arv-copy:copying block 752192751a8a72ae6ac8b0fdb58d37df+5000+Ab060805321f7358d04a045e537c1a8631d16e700@542ab826
2014-09-16 10:03:18 arvados.arv-copy[28644] INFO: Retrieved 5000 bytes
INFO:arvados.arv-copy:Retrieved 5000 bytes
2014-09-16 10:03:19 arvados.arv-copy[28644] DEBUG: saving 1235f41348b10eaff7d622dba7bd4a9f+83 manifest: . 752192751a8a72ae6ac8b0fdb58d37df+5000+Ab060805321f7358d04a045e537c1a8631d16e700@542ab826 0:5000:<a href="https://arvadosapi.com/4n8aq-8i9sb-c1utvb26t29nbq4">4n8aq-8i9sb-c1utvb26t29nbq4</a>.log.txt
DEBUG:arvados.arv-copy:saving 1235f41348b10eaff7d622dba7bd4a9f+83 manifest: . 752192751a8a72ae6ac8b0fdb58d37df+5000+Ab060805321f7358d04a045e537c1a8631d16e700@542ab826 0:5000:<a href="https://arvadosapi.com/4n8aq-8i9sb-c1utvb26t29nbq4">4n8aq-8i9sb-c1utvb26t29nbq4</a>.log.txt
Traceback (most recent call last):
File "/home/peter/work/arvados/sdk/cli/bin/arv-copy", line 4, in <module>
main()
File "/home/peter/work/arvados/sdk/python/arvados/commands/copy.py", line 78, in main
recursive=args.recursive)
File "/home/peter/work/arvados/sdk/python/arvados/commands/copy.py", line 151, in copy_pipeline_instance
recursive=True)
File "/home/peter/work/arvados/sdk/python/arvados/commands/copy.py", line 198, in copy_pipeline_template
pt = copy_collections(pt, src, dst)
File "/home/peter/work/arvados/sdk/python/arvados/commands/copy.py", line 224, in copy_collections
return {v: copy_collections(obj[v], src, dst) for v in obj}
File "/home/peter/work/arvados/sdk/python/arvados/commands/copy.py", line 224, in <dictcomp>
return {v: copy_collections(obj[v], src, dst) for v in obj}
File "/home/peter/work/arvados/sdk/python/arvados/commands/copy.py", line 224, in copy_collections
return {v: copy_collections(obj[v], src, dst) for v in obj}
File "/home/peter/work/arvados/sdk/python/arvados/commands/copy.py", line 224, in <dictcomp>
return {v: copy_collections(obj[v], src, dst) for v in obj}
File "/home/peter/work/arvados/sdk/python/arvados/commands/copy.py", line 224, in copy_collections
return {v: copy_collections(obj[v], src, dst) for v in obj}
File "/home/peter/work/arvados/sdk/python/arvados/commands/copy.py", line 224, in <dictcomp>
return {v: copy_collections(obj[v], src, dst) for v in obj}
File "/home/peter/work/arvados/sdk/python/arvados/commands/copy.py", line 224, in copy_collections
return {v: copy_collections(obj[v], src, dst) for v in obj}
File "/home/peter/work/arvados/sdk/python/arvados/commands/copy.py", line 224, in <dictcomp>
return {v: copy_collections(obj[v], src, dst) for v in obj}
File "/home/peter/work/arvados/sdk/python/arvados/commands/copy.py", line 219, in copy_collections
newc = copy_collection(obj, src, dst)
File "/home/peter/work/arvados/sdk/python/arvados/commands/copy.py", line 314, in copy_collection
del c['owner_uuid']
KeyError: 'owner_uuid'
</pre>
<ul>
<li>On investigation, this appears to be due to copying collections by content hash (which does not return 'owner_uuid') instead of by uuid.</li>
</ul> Arvados - Idea #3699: [SDKs] Copy a pipeline instance, along with its input and output data, from one arvados instance to anotherhttps://dev.arvados.org/issues/3699?journal_id=149122014-09-16T11:08:27ZTim Piercetwp@curoverse.com
<ul></ul><p>Peter Amstutz wrote:</p>
<blockquote>
<p>1. Not addressed</p>
</blockquote>
<p><a class="changeset" title="3699: bug fixes and feedback * added 'arv copy' front end to sdk/cli/bin/arv * can supply --recu..." href="https://dev.arvados.org/projects/arvados/repository/arvados/revisions/34aac296f4a0d2df0e369a9169924ef7849d6e85">34aac296</a> doesn't address this? I'm confused.</p>
<blockquote>
<p>2. Not addressed</p>
</blockquote>
<p>Adding a better error text for a missing arvados conf file, stand by.</p>
<blockquote>
<p>3. Not addressed</p>
</blockquote>
<p>I'm sympathetic to the confusion, but open to ideas about how we could effectively use the cut-and-pasted text. The core problem here is that arv-copy is the first tool that needs to know how to authenticate to multiple Arvados instances simultaneously, so the environment variables simply aren't going to be useful.</p>
<p>If the user cut-and-pastes the authentication environment variables from a source Arvados, and then cut-and-pastes the auth variables from the destination, which takes precedence?</p>
<blockquote>
<p>4. Not addressed</p>
</blockquote>
<p>Open to ideas here as well. How about permitting both <code>--src=qr1hi --dst=4xphq</code> and <code>--src=$HOME/my-qr1hi.conf --dst=$HOME/4xphq.txt</code>? i.e. if the src or dst arvados names start with a slash, treat that as an absolute path.</p>
<blockquote>
I'm copying from my local development instance, so the configuration is probably broken, but this error messages is totally <br />unhelpful in telling me what went wrong:<br />[...]
<ul>
<li>On further investigation there is a copy-and-paste error, it says "source" in both Exceptions in copy_git_repos().</li>
<li>It appears that '--dst-git-repo' does not have a default value (such as choosing the first writable git repo in the destination list) so it is actually required on the command line, but this is not enforced.</li>
<li>With some tinkering, I was able to copy a pipeline instance successfully. However, while arv-copy correctly updated the 'repository' portion of the component, it did not update the 'script_version' to point to the appropriate branch.</li>
</ul>
</blockquote>
<p>Correct: --dst-git-repo is required. I'll have the Python argparse enforce it on the command line.</p>
<p>I investigated ways to identify a default git repo for the destination but didn't find any I liked. I'd prefer "choose a git repo randomly from ones you own" but repository ownership doesn't seem well defined. I'm uneasy about "choose a writable git repo randomly" if it means that you end up stuffing a lot of copied repository data into someone else's repository that you've been given write access to.</p>
<blockquote>
Next, I tried to copy another pipeline with associated collections:<br />[...]
<ul>
<li>On investigation, this appears to be due to copying collections by content hash (which does not return 'owner_uuid') instead of by uuid.</li>
</ul>
</blockquote>
<p>Ah, I slipped in the <code>del c['owner_uuid']</code> before submitting but didn't test it on that particular case -- I didn't realize that collections could be returned without an owner_uuid. Fixing.</p> Arvados - Idea #3699: [SDKs] Copy a pipeline instance, along with its input and output data, from one arvados instance to anotherhttps://dev.arvados.org/issues/3699?journal_id=149142014-09-16T11:25:47ZTim Piercetwp@curoverse.com
<ul></ul><p>Peter Amstutz wrote:</p>
<blockquote>
<ul>
<li>With some tinkering, I was able to copy a pipeline instance successfully. However, while arv-copy correctly updated the 'repository' portion of the component, it did not update the 'script_version' to point to the appropriate branch.</li>
</ul>
</blockquote>
<p>So the script_version needs to be updated even though the specified commit hash exists in both repositories under the same name?</p>
<p>(On reflection, that is obviously going to be true when script_version references a branch on the source. Ugh. But is it appropriate to rename when script_version references a hash?)</p> Arvados - Idea #3699: [SDKs] Copy a pipeline instance, along with its input and output data, from one arvados instance to anotherhttps://dev.arvados.org/issues/3699?journal_id=149162014-09-16T11:36:18ZTim Piercetwp@curoverse.com
<ul></ul><p>New rev at <a class="changeset" title="3699: code review * Issue helpful error message when config file cannot be opened * Require --ds..." href="https://dev.arvados.org/projects/arvados/repository/arvados/revisions/04a4fa5843c3511260a750065cf79203ae1663ee">04a4fa5</a></p>
<ul>
<li>Issue helpful error message when config file cannot be opened</li>
<li>Require <code>--dst-git-repo</code> argument</li>
<li>Allow collections without owner_uuid (i.e. when retrieved by data hash rather than uuid)</li>
<li>Corrected "source"/"destination" error message in copy_git_repo</li>
</ul> Arvados - Idea #3699: [SDKs] Copy a pipeline instance, along with its input and output data, from one arvados instance to anotherhttps://dev.arvados.org/issues/3699?journal_id=149952014-09-17T09:53:01ZTim Piercetwp@curoverse.com
<ul></ul><p>New revision at <a class="changeset" title="3699: bugfix (renamed repository_map -> local_repo_dir)" href="https://dev.arvados.org/projects/arvados/repository/arvados/revisions/9cff4a0bf758ebb2e1a63df6a25c83f11752f8d8">9cff4a0</a> renames symbolic names found in <code>script_version</code> and <code>supplied_script_version</code> fields to the commit hashes they resolve to.</p> Arvados - Idea #3699: [SDKs] Copy a pipeline instance, along with its input and output data, from one arvados instance to anotherhttps://dev.arvados.org/issues/3699?journal_id=150052014-09-17T11:00:25ZPeter Amstutzpeter.amstutz@curii.com
<ul></ul><ul>
<li>Default logging level is still "DEBUG", that needs to be changed</li>
<li>It should list which collections are being copied and "bytes uploaded" progress.</li>
<li>Collections are copied without names which makes it impossible to figure out which is which</li>
<li>Should be --project-uuid not --project_uuid</li>
<li>I tried copying <br /><a class="external" href="https://workbench.qr1hi.arvadosapi.com/pipeline_instances/qr1hi-d1hrv-f9wf1btyvsevep8">https://workbench.qr1hi.arvadosapi.com/pipeline_instances/qr1hi-d1hrv-f9wf1btyvsevep8</a> <br />and got <br /><a class="external" href="https://workbench.9tee4.arvadosapi.com/pipeline_instances/9tee4-d1hrv-a0ecr7hb7sreu74">https://workbench.9tee4.arvadosapi.com/pipeline_instances/9tee4-d1hrv-a0ecr7hb7sreu74</a><br />then I did a clone and run, the result doesn't work:<br /><a class="external" href="https://workbench.9tee4.arvadosapi.com/pipeline_instances/9tee4-d1hrv-72jjxiig9twrsio">https://workbench.9tee4.arvadosapi.com/pipeline_instances/9tee4-d1hrv-72jjxiig9twrsio</a><br />I suspect not all the collection data transferred over.</li>
</ul> Arvados - Idea #3699: [SDKs] Copy a pipeline instance, along with its input and output data, from one arvados instance to anotherhttps://dev.arvados.org/issues/3699?journal_id=150522014-09-17T15:09:52ZPeter Amstutzpeter.amstutz@curii.com
<ul><li><strong>Target version</strong> changed from <i>2014-09-17 sprint</i> to <i>2014-10-08 sprint</i></li></ul> Arvados - Idea #3699: [SDKs] Copy a pipeline instance, along with its input and output data, from one arvados instance to anotherhttps://dev.arvados.org/issues/3699?journal_id=160612014-10-07T15:37:08ZTim Piercetwp@curoverse.com
<ul></ul><p>Peter Amstutz wrote:</p>
<blockquote>
<ul>
<li>Default logging level is still "DEBUG", that needs to be changed</li>
</ul>
</blockquote>
<p>Debug logging is now off by default. It can be enabled with a <code>-v/--verbose</code> flag.</p>
<blockquote>
<ul>
<li>It should list which collections are being copied and "bytes uploaded" progress.</li>
</ul>
</blockquote>
<p>I'll work on enabling a pretty progress report (like with arv-put) for non-verbose mode. For now, you get ugly line-by-line status reports with --verbose.</p>
<blockquote>
<ul>
<li>Collections are copied without names which makes it impossible to figure out which is which</li>
</ul>
</blockquote>
<p>It's not clear to me why this was happening (or whether it is still a problem). We are not removing the name field from the collection record when copying it to the destination. I'll try to reproduce.</p>
<blockquote>
<ul>
<li>Should be --project-uuid not --project_uuid</li>
</ul>
</blockquote>
<p>Fixed.</p>
<blockquote>
<ul>
<li>I tried copying <br /><a class="external" href="https://workbench.qr1hi.arvadosapi.com/pipeline_instances/qr1hi-d1hrv-f9wf1btyvsevep8">https://workbench.qr1hi.arvadosapi.com/pipeline_instances/qr1hi-d1hrv-f9wf1btyvsevep8</a> <br />and got <br /><a class="external" href="https://workbench.9tee4.arvadosapi.com/pipeline_instances/9tee4-d1hrv-a0ecr7hb7sreu74">https://workbench.9tee4.arvadosapi.com/pipeline_instances/9tee4-d1hrv-a0ecr7hb7sreu74</a><br />then I did a clone and run, the result doesn't work:<br /><a class="external" href="https://workbench.9tee4.arvadosapi.com/pipeline_instances/9tee4-d1hrv-72jjxiig9twrsio">https://workbench.9tee4.arvadosapi.com/pipeline_instances/9tee4-d1hrv-72jjxiig9twrsio</a><br />I suspect not all the collection data transferred over.</li>
</ul>
</blockquote>
<p>Fixed a bug in collection copying (it was copying only the first block on each manifest line, oops). Added a <code>--force</code> flag, to copy collection blocks even if the collection record already exists at the destination (to fix broken collections like these).</p>
<p>Trivial pipelines can be copied and run. I'll try reproducing the broken pipelines you hit and see if they work now.</p> Arvados - Idea #3699: [SDKs] Copy a pipeline instance, along with its input and output data, from one arvados instance to anotherhttps://dev.arvados.org/issues/3699?journal_id=161252014-10-07T21:38:11ZTim Piercetwp@curoverse.com
<ul></ul><p>Peter Amstutz wrote:</p>
<blockquote>
<ul>
<li>Collections are copied without names which makes it impossible to figure out which is which</li>
</ul>
</blockquote>
<p>I haven't been able to reproduce this problem, and copying collections now does preserve the name (e.g. <a class="external" href="https://workbench.4xphq.arvadosapi.com/collections/4xphq-4zz18-c5yckkig86ln3zc">https://workbench.4xphq.arvadosapi.com/collections/4xphq-4zz18-c5yckkig86ln3zc</a> which was just copied from qr1hi).</p> Arvados - Idea #3699: [SDKs] Copy a pipeline instance, along with its input and output data, from one arvados instance to anotherhttps://dev.arvados.org/issues/3699?journal_id=162052014-10-08T18:10:15ZTim Piercetwp@curoverse.com
<ul><li><strong>Target version</strong> changed from <i>2014-10-08 sprint</i> to <i>2014-10-29 sprint</i></li></ul> Arvados - Idea #3699: [SDKs] Copy a pipeline instance, along with its input and output data, from one arvados instance to anotherhttps://dev.arvados.org/issues/3699?journal_id=162322014-10-08T19:14:21ZTim Piercetwp@curoverse.com
<ul><li><strong>Story points</strong> changed from <i>3.0</i> to <i>1.0</i></li></ul> Arvados - Idea #3699: [SDKs] Copy a pipeline instance, along with its input and output data, from one arvados instance to anotherhttps://dev.arvados.org/issues/3699?journal_id=165472014-10-17T15:16:50ZTim Piercetwp@curoverse.com
<ul></ul><p>Ready for re-review at <a class="changeset" title="3699: report on collection copying progress" href="https://dev.arvados.org/projects/arvados/repository/arvados/revisions/41c5cc5a3731417a31c8db685e78cb795bbbe91b">41c5cc5</a>: collection copying now gets you a nice report by default (can disable with <code>--no-progress</code>).</p>
<p>Since this has had master merged back into it, there's a huge amount of diff churn -- I recommend <code>git difftool 9cff4a0..HEAD sdk/python/arvados/commands/copy.py</code> to view changes to <code>copy.py</code> just since your last review (which is where almost all of the changes have gone since then).</p> Arvados - Idea #3699: [SDKs] Copy a pipeline instance, along with its input and output data, from one arvados instance to anotherhttps://dev.arvados.org/issues/3699?journal_id=166042014-10-17T19:20:38ZTim Piercetwp@curoverse.com
<ul></ul><p>Try again at revision <a class="changeset" title="3699: allow script_version to be a branch Fix copy_git_repo to behave properly when the source s..." href="https://dev.arvados.org/projects/arvados/repository/arvados/revisions/b81c434401a503746ec54e53bf7058cf42beaa2f">b81c434</a>: <code>copy_git_repo</code> should now correctly handle different branches in the source repository, and resolve the script_versions appropriately in the copied pipeline. Successfully copied <a href="https://arvadosapi.com/qr1hi-d1hrv-fr2cgdn3q50f4ae">qr1hi-d1hrv-fr2cgdn3q50f4ae</a> to 4xphq.</p> Arvados - Idea #3699: [SDKs] Copy a pipeline instance, along with its input and output data, from one arvados instance to anotherhttps://dev.arvados.org/issues/3699?journal_id=166262014-10-17T20:53:21ZPeter Amstutzpeter.amstutz@curii.com
<ul></ul><ul>
<li>Help text says "Copy a pipeline instance from one Arvados instance to another", but should mention you can copy templates, collections</li>
<li>Needs to be smarter about finding content hashes in script_parameters. For example, <a href="https://arvadosapi.com/4xphq-d1hrv-t4soj38x1s5bbfs">4xphq-d1hrv-t4soj38x1s5bbfs</a> has a content hash embedded in a string: "$(file 3229739b505d2b878b62aed09895a55a+142/HWI-ST1027_129_D0THKACXX.1_1.fastq)" </li>
<li>Use logger consistently, e.g. use <code>logger.info()</code> instead <code>print >>sys.stderr</code></li>
<li>Prefer <code>isinstance(obj, basestr)</code> instead of <code>type(obj) in [str, unicode]</code></li>
<li><code>ensure_unique_name</code> is a parameter of the create method, not an object field. I think you want to be using <code>dst.pipeline_instances().create(body=pi, ensure_unique_name=True).execute()</code> (you should test this)</li>
<li>Failed: <code>arv-copy --verbose --src qr1hi --dst 4xphq --dst-git-repo peter <a href="https://arvadosapi.com/qr1hi-d1hrv-44qifjjtuoh2xcw">qr1hi-d1hrv-44qifjjtuoh2xcw</a></code><br /><pre>
2014-10-17 16:51:49 arvados.arv-copy[17845] DEBUG: src_git_url: git@git.qr1hi.arvadosapi.com:arvados.git
2014-10-17 16:51:49 arvados.arv-copy[17845] DEBUG: dst_git_push_url: git@git.4xphq.arvadosapi.com:peter.git
Cloning into bare repository '/tmp/tmpQfF59g'...
remote: Counting objects: 47209, done.
remote: Compressing objects: 100% (17572/17572), done.
remote: Total 47209 (delta 33108), reused 38969 (delta 26834)
Receiving objects: 100% (47209/47209), 7.33 MiB | 3.01 MiB/s, done.
Resolving deltas: 100% (33108/33108), done.
Checking connectivity... done.
Everything up-to-date
2014-10-17 16:51:55 arvados.arv-copy[17845] DEBUG: Copying collection <a href="https://arvadosapi.com/qr1hi-4zz18-0gobfjfihm0bi1p">qr1hi-4zz18-0gobfjfihm0bi1p</a>
2014-10-17 16:51:55 arvados.arv-copy[17845] DEBUG: Copying block 12f45b121fde0cc5a80656050c2a5acc (29926740 bytes)
<a href="https://arvadosapi.com/qr1hi-4zz18-0gobfjfihm0bi1p">qr1hi-4zz18-0gobfjfihm0bi1p</a>: 0M / 28M 0.0%
2014-10-17 16:52:09 arvados.arv-copy[17845] DEBUG: saving <a href="https://arvadosapi.com/qr1hi-4zz18-0gobfjfihm0bi1p">qr1hi-4zz18-0gobfjfihm0bi1p</a> manifest: . 12f45b121fde0cc5a80656050c2a5acc+29926740+Ab2a278ea3fdeaae02437bf153d3916b22c956b3c@5453f679 0:29926740:HWI-ST1027_129_D0THKACXX.1_1.sam
2014-10-17 16:52:09 arvados.arv-copy[17845] DEBUG: Copying collection <a href="https://arvadosapi.com/qr1hi-4zz18-ah4fm98e4osc5ua">qr1hi-4zz18-ah4fm98e4osc5ua</a>
2014-10-17 16:52:09 arvados.arv-copy[17845] DEBUG: Copying block c313fb45a7d5f7580ba1eebb1e071b46 (25104384 bytes)
<a href="https://arvadosapi.com/qr1hi-4zz18-ah4fm98e4osc5ua">qr1hi-4zz18-ah4fm98e4osc5ua</a>: 0M / 23M 0.0%
2014-10-17 16:52:22 arvados.arv-copy[17845] DEBUG: saving <a href="https://arvadosapi.com/qr1hi-4zz18-ah4fm98e4osc5ua">qr1hi-4zz18-ah4fm98e4osc5ua</a> manifest: . c313fb45a7d5f7580ba1eebb1e071b46+25104384+A53a28ae3ebcc10f0ca106e38d785ce61623d4d26@5453f686 0:12552192:HWI-ST1027_129_D0THKACXX.1_1.fastq 12552192:12552192:HWI-ST1027_129_D0THKACXX.1_2.fastq
Traceback (most recent call last):
File "/home/peter/work/arvados/sdk/cli/bin/arv-copy", line 4, in <module>
main()
File "/home/peter/work/arvados/sdk/python/arvados/commands/copy.py", line 103, in main
args)
File "/home/peter/work/arvados/sdk/python/arvados/commands/copy.py", line 196, in copy_pipeline_instance
pi = copy_collections(pi, src, dst, args)
File "/home/peter/work/arvados/sdk/python/arvados/commands/copy.py", line 272, in copy_collections
return {v: copy_collections(obj[v], src, dst, args) for v in obj}
File "/home/peter/work/arvados/sdk/python/arvados/commands/copy.py", line 272, in <dictcomp>
return {v: copy_collections(obj[v], src, dst, args) for v in obj}
File "/home/peter/work/arvados/sdk/python/arvados/commands/copy.py", line 272, in copy_collections
return {v: copy_collections(obj[v], src, dst, args) for v in obj}
File "/home/peter/work/arvados/sdk/python/arvados/commands/copy.py", line 272, in <dictcomp>
return {v: copy_collections(obj[v], src, dst, args) for v in obj}
File "/home/peter/work/arvados/sdk/python/arvados/commands/copy.py", line 272, in copy_collections
return {v: copy_collections(obj[v], src, dst, args) for v in obj}
File "/home/peter/work/arvados/sdk/python/arvados/commands/copy.py", line 272, in <dictcomp>
return {v: copy_collections(obj[v], src, dst, args) for v in obj}
File "/home/peter/work/arvados/sdk/python/arvados/commands/copy.py", line 272, in copy_collections
return {v: copy_collections(obj[v], src, dst, args) for v in obj}
File "/home/peter/work/arvados/sdk/python/arvados/commands/copy.py", line 272, in <dictcomp>
return {v: copy_collections(obj[v], src, dst, args) for v in obj}
File "/home/peter/work/arvados/sdk/python/arvados/commands/copy.py", line 267, in copy_collections
newc = copy_collection(obj, src, dst, args)
File "/home/peter/work/arvados/sdk/python/arvados/commands/copy.py", line 438, in copy_collection
return dst.collections().create(body=c).execute()
File "/usr/local/lib/python2.7/dist-packages/oauth2client/util.py", line 132, in positional_wrapper
return wrapped(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/apiclient/http.py", line 723, in execute
raise HttpError(resp, content, uri=self.uri)
arvados.errors.ApiError: <HttpError 422 when requesting https://4xphq.arvadosapi.com/arvados/v1/collections?alt=json returned "Portable data hash does not match hash of manifest_text">
</pre></li>
</ul> Arvados - Idea #3699: [SDKs] Copy a pipeline instance, along with its input and output data, from one arvados instance to anotherhttps://dev.arvados.org/issues/3699?journal_id=166582014-10-20T18:23:10ZTim Piercetwp@curoverse.com
<ul></ul><p>Nice finds. New version at <a class="changeset" title="3699: collection copying bug fixes From code review #3699-35: * Updated help text * Find collect..." href="https://dev.arvados.org/projects/arvados/repository/arvados/revisions/cdaf5c71016d2cad54d54e9b4b87bafe4554d376">cdaf5c7</a>.</p>
<p>Peter Amstutz wrote:</p>
<blockquote>
<ul>
<li>Help text says "Copy a pipeline instance from one Arvados instance to another", but should mention you can copy templates, collections</li>
</ul>
</blockquote>
<p>Updated.</p>
<blockquote>
<ul>
<li>Needs to be smarter about finding content hashes in script_parameters. For example, <a href="https://arvadosapi.com/4xphq-d1hrv-t4soj38x1s5bbfs">4xphq-d1hrv-t4soj38x1s5bbfs</a> has a content hash embedded in a string: "$(file 3229739b505d2b878b62aed09895a55a+142/HWI-ST1027_129_D0THKACXX.1_1.fastq)"</li>
</ul>
</blockquote>
<p>Ugh, good catch. Fixed, but it required substantially rewriting <code>copy_collections</code> stuff so that the command line arguments can be intelligently rewritten. (As a bonus, <code>copy_collections</code> now also keeps track of which collections have been copied in this session, so it can avoid repeatedly asking <em>dst</em> whether this collection exists.)</p>
<blockquote>
<ul>
<li>Use logger consistently, e.g. use <code>logger.info()</code> instead <code>print >>sys.stderr</code></li>
</ul>
</blockquote>
<p>Done.</p>
<blockquote>
<ul>
<li>Prefer <code>isinstance(obj, basestr)</code> instead of <code>type(obj) in [str, unicode]</code></li>
</ul>
</blockquote>
<p>Thanks for that. Done.</p>
<blockquote>
<ul>
<li><code>ensure_unique_name</code> is a parameter of the create method, not an object field. I think you want to be using <code>dst.pipeline_instances().create(body=pi, ensure_unique_name=True).execute()</code> (you should test this)</li>
</ul>
</blockquote>
<p>Tested and confirmed -- done.</p>
<blockquote>
<ul>
<li>Failed: <code>arv-copy --verbose --src qr1hi --dst 4xphq --dst-git-repo peter <a href="https://arvadosapi.com/qr1hi-d1hrv-44qifjjtuoh2xcw">qr1hi-d1hrv-44qifjjtuoh2xcw</a></code><br />[...]</li>
</ul>
</blockquote>
<p>The bug here was that the source manifest had no trailing newline, but arv-copy was mistakenly adding one in the destination manifest. Fixed.</p> Arvados - Idea #3699: [SDKs] Copy a pipeline instance, along with its input and output data, from one arvados instance to anotherhttps://dev.arvados.org/issues/3699?journal_id=166592014-10-20T18:34:42ZPeter Amstutzpeter.amstutz@curii.com
<ul></ul><pre>
$ arv-copy --verbose --src qr1hi --dst 4xphq --dst-git-repo peter <a href="https://arvadosapi.com/qr1hi-d1hrv-44qifjjtuoh2xcw">qr1hi-d1hrv-44qifjjtuoh2xcw</a>
2014-10-20 14:33:57 arvados.arv-copy[3159] DEBUG: src_git_url: git@git.qr1hi.arvadosapi.com:arvados.git
2014-10-20 14:33:57 arvados.arv-copy[3159] DEBUG: dst_git_push_url: git@git.4xphq.arvadosapi.com:peter.git
Cloning into bare repository '/tmp/tmpPmFdsM'...
remote: Counting objects: 47209, done.
remote: Compressing objects: 100% (17573/17573), done.
remote: Total 47209 (delta 33108), reused 38973 (delta 26833)
Receiving objects: 100% (47209/47209), 7.33 MiB | 4.14 MiB/s, done.
Resolving deltas: 100% (33108/33108), done.
Checking connectivity... done.
Everything up-to-date
Traceback (most recent call last):
File "/home/peter/work/arvados/sdk/cli/bin/arv-copy", line 4, in <module>
main()
File "/home/peter/work/arvados/sdk/python/arvados/commands/copy.py", line 107, in main
args)
File "/home/peter/work/arvados/sdk/python/arvados/commands/copy.py", line 200, in copy_pipeline_instance
pi = copy_collections(pi, src, dst, args)
File "/home/peter/work/arvados/sdk/python/arvados/commands/copy.py", line 294, in copy_collections
return {v: copy_collections(obj[v], src, dst, args) for v in obj}
File "/home/peter/work/arvados/sdk/python/arvados/commands/copy.py", line 294, in <dictcomp>
return {v: copy_collections(obj[v], src, dst, args) for v in obj}
File "/home/peter/work/arvados/sdk/python/arvados/commands/copy.py", line 294, in copy_collections
return {v: copy_collections(obj[v], src, dst, args) for v in obj}
File "/home/peter/work/arvados/sdk/python/arvados/commands/copy.py", line 294, in <dictcomp>
return {v: copy_collections(obj[v], src, dst, args) for v in obj}
File "/home/peter/work/arvados/sdk/python/arvados/commands/copy.py", line 294, in copy_collections
return {v: copy_collections(obj[v], src, dst, args) for v in obj}
File "/home/peter/work/arvados/sdk/python/arvados/commands/copy.py", line 294, in <dictcomp>
return {v: copy_collections(obj[v], src, dst, args) for v in obj}
File "/home/peter/work/arvados/sdk/python/arvados/commands/copy.py", line 291, in copy_collections
obj = arvados.util.collection_uuid_pattern.sub(copy_collection_fn, obj)
File "/home/peter/work/arvados/sdk/python/arvados/commands/copy.py", line 280, in copy_collection_fn
dst_col = copy_collection(src_id, src, dst, args)
File "/home/peter/work/arvados/sdk/python/arvados/commands/copy.py", line 387, in copy_collection
c = src.collections().get(uuid=obj_uuid).execute()
File "/usr/local/lib/python2.7/dist-packages/oauth2client/util.py", line 132, in positional_wrapper
return wrapped(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/apiclient/http.py", line 723, in execute
raise HttpError(resp, content, uri=self.uri)
arvados.errors.ApiError: <HttpError 404 when requesting https://qr1hi.arvadosapi.com/arvados/v1/collections/%3C_sre.SRE_Match%20object%20at%200x7ffed8ae5920%3E?alt=json returned "Path not found">
</pre> Arvados - Idea #3699: [SDKs] Copy a pipeline instance, along with its input and output data, from one arvados instance to anotherhttps://dev.arvados.org/issues/3699?journal_id=166632014-10-20T19:20:22ZTim Piercetwp@curoverse.com
<ul></ul><p>Revision at <a class="changeset" title="3699: bug fix The re.sub 'repl' function takes a MatchObject as argument, not a string. Oops. A..." href="https://dev.arvados.org/projects/arvados/repository/arvados/revisions/8bd432b7fe67766d6f92902e20b5e63c9f18146d">8bd432b</a>:</p>
<p><em>The re.sub 'repl' function takes a MatchObject as argument, not a string. Oops.</em></p>
<p><em>Also we need to do manifest.splitlines(True) in order to be able to tell whether the manifest ends with a newline in the first place.</em></p>
<p>Used the code at this head to clone several arv-run pipelines successfully (the most complex pipelines I could find that could be copied in a reasonable amount of time for testing)</p> Arvados - Idea #3699: [SDKs] Copy a pipeline instance, along with its input and output data, from one arvados instance to anotherhttps://dev.arvados.org/issues/3699?journal_id=166682014-10-20T19:57:25ZPeter Amstutzpeter.amstutz@curii.com
<ul></ul><ul>
<li>I used arv-copy and got a pipeline: <a href="https://arvadosapi.com/4xphq-d1hrv-c25qsfov6u70jt1">4xphq-d1hrv-c25qsfov6u70jt1</a> It has been given an unhelpful generic name (granted the source pipeline name is null) and a spurious "None" in its description:<br /><pre>
New pipeline instance
Pipeline copied from <a href="https://arvadosapi.com/qr1hi-d1hrv-44qifjjtuoh2xcw">qr1hi-d1hrv-44qifjjtuoh2xcw</a>
None
</pre></li>
</ul>
<ul>
<li>I don't know if there's a good way to efficiently search pipeline templates on the destination, but after copying the same stuff over and over again I'm up to at least 6 duplicates of the same template: "Tutorial align using bwa mem copied from <a href="https://arvadosapi.com/qr1hi-p5p6p-itzkwxblfermlwv">qr1hi-p5p6p-itzkwxblfermlwv</a> (5)"</li>
</ul>
<ul>
<li>I crashed it again. At this point, the configuration file for "peter" doesn't exist yet.<br /><pre>
$ arv-copy --src qr1hi --dst peter --dst-git-repo peter <a href="https://arvadosapi.com/qr1hi-d1hrv-44qifjjtuoh2xcw">qr1hi-d1hrv-44qifjjtuoh2xcw</a>
Traceback (most recent call last):
File "/usr/lib/python2.7/logging/__init__.py", line 859, in emit
msg = self.format(record)
File "/usr/lib/python2.7/logging/__init__.py", line 732, in format
return fmt.format(record)
File "/usr/lib/python2.7/logging/__init__.py", line 471, in format
record.message = record.getMessage()
File "/usr/lib/python2.7/logging/__init__.py", line 335, in getMessage
msg = msg % self.args
TypeError: not all arguments converted during string formatting
Logged from file copy.py, line 559
</pre></li>
</ul> Arvados - Idea #3699: [SDKs] Copy a pipeline instance, along with its input and output data, from one arvados instance to anotherhttps://dev.arvados.org/issues/3699?journal_id=166692014-10-20T20:08:18ZPeter Amstutzpeter.amstutz@curii.com
<ul></ul><ul>
<li>This is not terribly reassuring: <br /><pre>
<a href="https://arvadosapi.com/qr1hi-4zz18-0gobfjfihm0bi1p">qr1hi-4zz18-0gobfjfihm0bi1p</a>: 0M / 28M 0.0%
b9edd3ac5dd0717f6ca587c3b2ec9885+83: 0M / 0M 0.0%
</pre><br />(presumably the blocks are actually being sent, but there's no extra print statement for 100% when it's done with a collection)</li>
</ul> Arvados - Idea #3699: [SDKs] Copy a pipeline instance, along with its input and output data, from one arvados instance to anotherhttps://dev.arvados.org/issues/3699?journal_id=166732014-10-20T20:39:16ZPeter Amstutzpeter.amstutz@curii.com
<ul></ul><pre>
$ arv-copy --src qr1hi --dst 4n8aq --dst-git-repo peter2 <a href="https://arvadosapi.com/qr1hi-d1hrv-44qifjjtuoh2xcw">qr1hi-d1hrv-44qifjjtuoh2xcw</a>
loning into bare repository '/tmp/tmpMIOc6_'...
remote: Counting objects: 47209, done.
remote: Compressing objects: 100% (17576/17576), done.
remote: Total 47209 (delta 33107), reused 38974 (delta 26830)
Receiving objects: 100% (47209/47209), 7.33 MiB | 4.58 MiB/s, done.
Resolving deltas: 100% (33107/33107), done.
Checking connectivity... done.
Total 0 (delta 0), reused 0 (delta 0)
To /home/peter/work/arvados_prod_repos/peter2
* [new branch] git_git_qr1hi_arvadosapi_com_arvados_git_3cc80b447efcaf416ea4d6857d6d40583e462ff8 -> git_git_qr1hi_arvadosapi_com_arvados_git_3cc80b447efcaf416ea4d6857d6d40583e462ff8
<a href="https://arvadosapi.com/qr1hi-4zz18-0gobfjfihm0bi1p">qr1hi-4zz18-0gobfjfihm0bi1p</a>: 0M / 28M 0.0%
b9edd3ac5dd0717f6ca587c3b2ec9885+83: 0M / 0M 0.0%
142e99c2dec346e621fd3eeb30a63387+1050: 1408M / 1442M 97.6%
2014-10-20 16:14:43 arvados.arv-copy[8356] INFO:
2014-10-20 16:14:43 arvados.arv-copy[8356] INFO: Success: created copy with uuid <a href="https://arvadosapi.com/4n8aq-d1hrv-1gswgofj4qon38c">4n8aq-d1hrv-1gswgofj4qon38c</a>
</pre>
<p>It should have copied "<a href="https://arvadosapi.com/qr1hi-4zz18-ah4fm98e4osc5ua">qr1hi-4zz18-ah4fm98e4osc5ua</a>" ("sample" in "script_parameters") but it was skipped for some reason. However, the fact that collections are showing up in script_parameters at all is a much bigger problem: <a class="issue tracker-1 status-3 priority-4 priority-default closed parent" title="Bug: [API] Disallow collection UUIDs in script_parameters, only allow portable data hash (Resolved)" href="https://dev.arvados.org/issues/4269">#4269</a></p> Arvados - Idea #3699: [SDKs] Copy a pipeline instance, along with its input and output data, from one arvados instance to anotherhttps://dev.arvados.org/issues/3699?journal_id=166742014-10-20T20:40:17ZTim Piercetwp@curoverse.com
<ul></ul><p>At commit <a class="changeset" title="3699: bug fixes * abort() should not crash the program because it's calling logging.info wrong..." href="https://dev.arvados.org/projects/arvados/repository/arvados/revisions/81c3241d08ced11ea118c7f68db62032ad5bc469">81c3241</a>:</p>
<p>Peter Amstutz wrote:</p>
<blockquote>
<ul>
<li>I used arv-copy and got a pipeline: <a href="https://arvadosapi.com/4xphq-d1hrv-c25qsfov6u70jt1">4xphq-d1hrv-c25qsfov6u70jt1</a> It has been given an unhelpful generic name (granted the source pipeline name is null) and a spurious "None" in its description:<br />[...]</li>
</ul>
</blockquote>
<p>Fixed the "None" description (this just has to be a little more clever than <code>pi.get('description', '')</code></p>
<p>I don't know if there's a better solution than an unhelpful generic name, when copying a pipeline that already doesn't have a name. Happy to take suggestions there :-)</p>
<blockquote>
<ul>
<li>I don't know if there's a good way to efficiently search pipeline templates on the destination, but after copying the same stuff over and over again I'm up to at least 6 duplicates of the same template: "Tutorial align using bwa mem copied from <a href="https://arvadosapi.com/qr1hi-p5p6p-itzkwxblfermlwv">qr1hi-p5p6p-itzkwxblfermlwv</a> (5)"</li>
</ul>
</blockquote>
<p>This is based on explicit (albeit verbal) direction from Tom: if a user runs arv-copy twice on the same pipeline, arv-copy copies all of the source objects to the destination, period. Collections and git repositories are not duplicated because content-addressed storage essentially prevents it, but we are deliberately not attempting to be clever about reusing templates that have already been copied.</p>
<blockquote>
<ul>
<li>I crashed it again. At this point, the configuration file for "peter" doesn't exist yet.<br />[...]</li>
</ul>
</blockquote>
<p>In that case, it will abort anyway, but this is definitely a bug in the abort code. Fixed:<br /><pre>
(arv3699)hitchcock:/home/twp/arvados/sdk/python% arv-copy --src qr1hi --dst zzzzzz --dst-git-repo twp <a href="https://arvadosapi.com/qr1hi-p5p6p-itzkwxblfermlwv">qr1hi-p5p6p-itzkwxblfermlwv</a>
2014-10-20 16:26:44 arvados.arv-copy[1887] INFO: arv-copy: Could not open config file /home/twp/.config/arvados/zzzzzz.conf: [Errno 2] No such file or directory: '/home/twp/.config/arvados/zzzzzz.conf'
You must make sure that your configuration tokens
for Arvados instance zzzzzz are in /home/twp/.config/arvados/zzzzzz.conf and that this
file is readable.
</pre></p>
<blockquote>
<p>(presumably the blocks are actually being sent, but there's no extra print statement for 100% when it's done with a collection)</p>
</blockquote>
<p>Yeah, that does look alarming. Added a statement to finish out the progress report after the copy is done. (It still uses the actual bytes_written/bytes_expected numbers, so it actually will report less than 100% if those numbers don't match up for some reason.)</p> Arvados - Idea #3699: [SDKs] Copy a pipeline instance, along with its input and output data, from one arvados instance to anotherhttps://dev.arvados.org/issues/3699?journal_id=167012014-10-21T16:09:10ZPeter Amstutzpeter.amstutz@curii.com
<ul><li><strong>File</strong> <a href="/attachments/401">arv-copy-perf.png</a> <a class="icon-only icon-download" title="Download" href="/attachments/download/401/arv-copy-perf.png">arv-copy-perf.png</a> added</li></ul><p>I'm copying a pipeline with a 5990M collection. I noticed this code:</p>
<pre>
data = src_keep.get(word)
dst_locator = dst_keep.put(data)
</pre><br />See attached image, there's a very clear falloff between blocks -- doing this sequentially isn't optimal. Download and upload could proceed concurrently. Also, I suspect we could get better utilization if we downloaded 2 blocks at a time. But in the interests of getting arv-copy out the door we probably shouldn't do anything about it now. Arvados - Idea #3699: [SDKs] Copy a pipeline instance, along with its input and output data, from one arvados instance to anotherhttps://dev.arvados.org/issues/3699?journal_id=167032014-10-21T16:54:54ZPeter Amstutzpeter.amstutz@curii.com
<ul></ul><pre>
arv-copy --src qr1hi --dst 4n8aq --dst-git-repo peter2 <a href="https://arvadosapi.com/qr1hi-d1hrv-xbcpup0o8hexwwn">qr1hi-d1hrv-xbcpup0o8hexwwn</a>
</pre>
<p>Everything succeeded! Yay!</p>
<p>Went to re-run the pipeline locally:</p>
<pre>
Error creating job for component run_lobSTR: Docker image locator not found for bcosc/lobstr
</pre>
<p>Boo! It needs to either create docker_image_repo+tag links for the Docker image, or rewrite the <code>docker_image</code> field of <code>runtime_constraints</code> in the pipeline to use a Docker image hash or a Arvados collection hash.</p> Arvados - Idea #3699: [SDKs] Copy a pipeline instance, along with its input and output data, from one arvados instance to anotherhttps://dev.arvados.org/issues/3699?journal_id=167552014-10-22T17:19:07ZTim Piercetwp@curoverse.com
<ul></ul><p>Peter Amstutz wrote:</p>
<blockquote>
<p>See attached image, there's a very clear falloff between blocks -- doing this sequentially isn't optimal. Download and upload could proceed concurrently. Also, I suspect we could get better utilization if we downloaded 2 blocks at a time. But in the interests of getting arv-copy out the door we probably shouldn't do anything about it now.</p>
</blockquote>
<p>Agreed on both counts. I'll file a ticket to investigate implementing concurrency for arv-copy. arvados.keep.KeepClient already runs requests on threads internally -- it might be as simple as allowing the caller to tell KeepClient to return immediately and not wait for the request to finish?</p>
<blockquote>
<p>Boo! It needs to either create docker_image_repo+tag links for the Docker image, or rewrite the <code>docker_image</code> field of <code>runtime_constraints</code> in the pipeline to use a Docker image hash or a Arvados collection hash.</p>
</blockquote>
<p>Good catch. I think we should do as little rewriting as possible -- the copied pipeline should be identical to the original whenever possible. I'll have it add create the Docker tag links.</p> Arvados - Idea #3699: [SDKs] Copy a pipeline instance, along with its input and output data, from one arvados instance to anotherhttps://dev.arvados.org/issues/3699?journal_id=167722014-10-22T20:36:29ZTim Piercetwp@curoverse.com
<ul></ul><p>Peter Amstutz wrote:</p>
<blockquote>
<p>Boo! It needs to either create docker_image_repo+tag links for the Docker image, or rewrite the <code>docker_image</code> field of <code>runtime_constraints</code> in the pipeline to use a Docker image hash or a Arvados collection hash.</p>
</blockquote>
<p>Now available: <a class="changeset" title="3699: copy docker image links copy_docker_images and copy_docker_image copy any docker image col..." href="https://dev.arvados.org/projects/arvados/repository/arvados/revisions/d3dbc2c0557801f0e269a035e257beb09fd53618">d3dbc2c</a> includes <code>copy_docker_images</code> to copy any Docker images named by image+tag in the source pipeline, and create <code>docker_image_repo+tag</code> and <code>docker_image_hash</code> links at the destination. I think this solution is necessary: even if we rewrite the pipeline instance, the template will be unusable without docker image links.</p>
<p>What I'm not sure about is how to choose which link is the correct one, since apparently <code>docker_image_repo+tag</code> links do not enforce uniqueness on name, to wit:<br /><pre>
>>> import arvados.commands.copy
>>> api = arvados.commands.copy.api_for_instance('qr1hi')
>>> api.links().list(filters=[
['link_class','=','docker_image_repo+tag'],
['name','=','arvados/jobs:latest']
]).execute()['items_available']
9
</pre></p>
<p>What should I be doing here?</p> Arvados - Idea #3699: [SDKs] Copy a pipeline instance, along with its input and output data, from one arvados instance to anotherhttps://dev.arvados.org/issues/3699?journal_id=168042014-10-23T17:43:41ZTim Piercetwp@curoverse.com
<ul></ul><p>Okay: at <a class="changeset" title="3699: figure out correct docker image to fetch Use arvados.commands.keepdocker.list_images_in_ar..." href="https://dev.arvados.org/projects/arvados/repository/arvados/revisions/e8451457c28477a5c3716e878d09d0be147186d6">e845145</a> this code uses <code>arvados.commands.keepdocker.list_images_in_arv</code> to identify which docker image to pull from the source. A pipeline copied with this code produces appropriate links for arv-keepdocker to find in the destination:</p>
<pre>
(arv3699)hitchcock:/home/twp/arvados/sdk/python% ARVADOS_API_HOST=4xphq.arvadosapi.com ARVADOS_API_TOKEN=****** arv-keepdocker | grep lobstr
bcosc/lobstr latest ea32030ce02e <a href="https://arvadosapi.com/4xphq-4zz18-x72yrfegn8iwmmw">4xphq-4zz18-x72yrfegn8iwmmw</a> Thu Oct 23 17:09:57 2014
</pre>
<p>Try again?</p> Arvados - Idea #3699: [SDKs] Copy a pipeline instance, along with its input and output data, from one arvados instance to anotherhttps://dev.arvados.org/issues/3699?journal_id=168552014-10-24T15:10:27ZPeter Amstutzpeter.amstutz@curii.com
<ul></ul><p>When you're copying a pipeline instance that's been run already, the docker image will have been resolved into an explicit keep locator in the docker_image_locator field. Arv-copy needs to copy that one and not just the latest image with the same name.</p> Arvados - Idea #3699: [SDKs] Copy a pipeline instance, along with its input and output data, from one arvados instance to anotherhttps://dev.arvados.org/issues/3699?journal_id=168572014-10-24T15:41:05ZTim Piercetwp@curoverse.com
<ul></ul><p>Peter Amstutz wrote:</p>
<blockquote>
<p>When you're copying a pipeline instance that's been run already, the docker image will have been resolved into an explicit keep locator in the docker_image_locator field. Arv-copy needs to copy that one and not just the latest image with the same name.</p>
</blockquote>
<p>That's a deliberate decision:</p>
<ul>
<li>The collection identified in the docker_image_locator field will be copied anyway by <code>copy_collections</code>.</li>
<li>If the docker image+tag still resolves to that image when the pipeline is copied, the appropriate links will be added on the destination to make sure that is still true.</li>
<li>If the docker image has been updated on the source, so that docker image+tag no longer resolves to the docker_image_locator named in the pipeline, arv-copy will copy both docker images -- the docker_image_locator and the new docker image+tag.</li>
</ul>
<p>The objective is to make sure that the current state of the pipeline is copied as exactly as possible, including "what will happen if I rerun this pipeline with new options."</p>
<p>Let me know if you think this reasoning is unsound.</p> Arvados - Idea #3699: [SDKs] Copy a pipeline instance, along with its input and output data, from one arvados instance to anotherhttps://dev.arvados.org/issues/3699?journal_id=168782014-10-24T17:25:08ZTim Piercetwp@curoverse.com
<ul><li><strong>Status</strong> changed from <i>In Progress</i> to <i>Resolved</i></li><li><strong>% Done</strong> changed from <i>86</i> to <i>100</i></li></ul><p>Applied in changeset arvados|commit:35ade8a042094a27e2ca5cfd5e9754aa3513410c.</p> Arvados - Idea #3699: [SDKs] Copy a pipeline instance, along with its input and output data, from one arvados instance to anotherhttps://dev.arvados.org/issues/3699?journal_id=168832014-10-24T17:29:57ZPeter Amstutzpeter.amstutz@curii.com
<ul></ul><p>I transferred an entire pipeline instance from qr1hi and it ran on my workstation with no changes. Merge this thing!</p>