Bug #4520

[SDKs] arv copy should respect --project-uuid when saving destination Collection

Added by Sarah Guthrie about 6 years ago. Updated almost 6 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
SDKs
Target version:
Start date:
11/13/2014
Due date:
% Done:

100%

Estimated time:
(Total: 0.00 h)
Story points:
0.5

Description

arv copy --src settings --dst 9tee4 --project-uuid 9tee4-j7d0g-f75lcwo78ifvnl2 qr1hi-4zz18-85b713kdm9kle3r
(where settings pointed to qr1hi configuration file on lightning-dev3)
saved the new collection in the Home project instead of the project pointed to by the project-uuid.

The same behavior occurred when copying a 100MB collection and a pipeline template.

If --project-uuid only applies to pipeline instances, better -h documentation would be helpful


Subtasks

Task #5238: Review 4520-arv-copy-project-uuidResolvedPeter Amstutz

Task #5114: FixResolvedPeter Amstutz


Related issues

Has duplicate Arvados - Bug #5294: [SDKs] Make arv-copy useful for copying a Docker imageClosed02/23/2015

Associated revisions

Revision f0fe7273
Added by Peter Amstutz almost 6 years ago

Merge branch '4520-arv-copy-project-uuid' closes #4520

History

#1 Updated by Tim Pierce about 6 years ago

  • Category set to SDKs
  • Target version set to Bug Triage

#2 Updated by Tim Pierce about 6 years ago

  • Subject changed from arv copy did not pay attention to --project-uuid given to [SDKs] arv copy did not pay attention to --project-uuid given

#3 Updated by Brett Smith almost 6 years ago

  • Subject changed from [SDKs] arv copy did not pay attention to --project-uuid given to [SDKs] arv copy does not respect --project-uuid when copying Collection

This might be related to #3847.

#4 Updated by Tom Clegg almost 6 years ago

  • Subject changed from [SDKs] arv copy does not respect --project-uuid when copying Collection to [SDKs] arv copy should respect --project-uuid when saving destination Collection
  • Story points set to 0.5

#5 Updated by Tom Clegg almost 6 years ago

  • Target version changed from Bug Triage to Arvados Future Sprints

#6 Updated by Tom Clegg almost 6 years ago

  • Target version changed from Arvados Future Sprints to 2015-02-18 sprint

#7 Updated by Radhika Chippada almost 6 years ago

  • Assigned To set to Radhika Chippada

#8 Updated by Radhika Chippada almost 6 years ago

  • Assigned To deleted (Radhika Chippada)

#9 Updated by Peter Amstutz almost 6 years ago

  • Assigned To set to Peter Amstutz

#10 Updated by Peter Amstutz almost 6 years ago

  • Assigned To deleted (Peter Amstutz)

#11 Updated by Peter Amstutz almost 6 years ago

  • Assigned To set to Peter Amstutz

#12 Updated by Peter Amstutz almost 6 years ago

Notes:

It should now correctly respect --project-uuid for collections, templates, pipeline instances, and docker image metadata.

If a collection already exists on the target system, it will create a new collection in the target project using the existing manifest_text from the existing collection.

#13 Updated by Peter Amstutz almost 6 years ago

  • Status changed from New to In Progress

#14 Updated by Peter Amstutz almost 6 years ago

  • Target version changed from 2015-02-18 sprint to 2015-03-11 sprint

#15 Updated by Brett Smith almost 6 years ago

Reviewing 74784b3.

  • The code to take a source collection and turn it into a collection on dst (including necessary munging) seems long enough to be worth DRYing up at this point.
  • Why did the log level of the "Skipping message" collection upgrade from DEBUG to INFO? It seems like this could occasionally make the program noticeably more verbose, and I'm not sure how it helps the user to know this.

While I was reading this I noticed some other bugs in the code you're touching. If you're up for fixing them, I'd say go for it. If not, we should file separate bugs about them.

  • The collection create calls don't respect args.retries.
  • The docker_image_hash link is copied incorrectly. The name should be Docker's native "image ID," not the portable data hash of the corresponding collection.

Thanks.

#16 Updated by Peter Amstutz almost 6 years ago

Brett Smith wrote:

Reviewing 74784b3.

  • The code to take a source collection and turn it into a collection on dst (including necessary munging) seems long enough to be worth DRYing up at this point.

Done.

  • Why did the log level of the "Skipping message" collection upgrade from DEBUG to INFO? It seems like this could occasionally make the program noticeably more verbose, and I'm not sure how it helps the user to know this.

Took that out entirely and revamped the copying the logic to actually check the owner_uuid, name and portable_data_hash of existing collection at the destination before deciding whether to make a copy.

While I was reading this I noticed some other bugs in the code you're touching. If you're up for fixing them, I'd say go for it. If not, we should file separate bugs about them.

  • The collection create calls don't respect args.retries.

Fixed.

  • The docker_image_hash link is copied incorrectly. The name should be Docker's native "image ID," not the portable data hash of the corresponding collection.

Fixed. Also looks for (and copies) Docker metadata any time it is copying a collection and not only when copying pipelines.

Bonus assertion added to keep.py because it turns out arv-copy was passing unicode strings to KeepClient.put() (whoops!)

Now at a47dc2b

#17 Updated by Brett Smith almost 6 years ago

Reviewing a47dc2b

Peter Amstutz wrote:

  • The docker_image_hash link is copied incorrectly. The name should be Docker's native "image ID," not the portable data hash of the corresponding collection.

Fixed. Also looks for (and copies) Docker metadata any time it is copying a collection and not only when copying pipelines.

Thanks, that basically covers #5294.

Bonus assertion added to keep.py because it turns out arv-copy was passing unicode strings to KeepClient.put() (whoops!)

Please change this condition to isinstance(data, str) to accept subclasses, and add a test.

The new line args.project_uuid = dst_arv.users().current().execute()["uuid"] should respect args.retries.

Thanks.

#18 Updated by Peter Amstutz almost 6 years ago

Brett Smith wrote:

Please change this condition to isinstance(data, str) to accept subclasses, and add a test.

Fixed.

The new line args.project_uuid = dst_arv.users().current().execute()["uuid"] should respect args.retries.

Fixed.

Now at 7b0d393

#19 Updated by Brett Smith almost 6 years ago

Peter Amstutz wrote:

Brett Smith wrote:

Please change this condition to isinstance(data, str) to accept subclasses, and add a test.

Fixed.

The new line args.project_uuid = dst_arv.users().current().execute()["uuid"] should respect args.retries.

Fixed.

Both of these changes are good. Unfortunately, the change to make CollectionWriter.manifest_text() encode the return string isn't going to fly. The entire interface to CollectionWriter is already bytes-based, so users need to pass in encoded strings for filenames already. For anybody already passing in non-ASCII, this change will basically attempt a double-encode and break:

>>> import arvados
>>> cw = arvados.CollectionWriter()
>>> with cw.open(u'♥'.encode('utf-8')) as cf:
...     pass
...
>>> cw.manifest_text()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "arvados/collection.py", line 588, in manifest_text
    return manifest.encode("utf-8")
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 41: ordinal not in range(128)

If you're okay reverting that change, I think this is good to merge. If there's some nuance I'm missing, let's talk about it. Thanks.

#20 Updated by Peter Amstutz almost 6 years ago

  • Don't encode manifest_text by default to avoid double-encode faults.
  • Coerce unicode strings to ascii in put(). Will raise an error if it is not a valid conversion.
  • Use result.content (returns literal result bytes) instead result.text (returns unicode) when processing Keep responses.
  • Updated test.

#21 Updated by Brett Smith almost 6 years ago

Peter Amstutz wrote:

  • Updated test.

Please assert that the first put returns the right hash (so we're sure it encodes correctly), and ditch the noop foo_locator assignments here. Then go ahead and merge. Thanks.

#22 Updated by Peter Amstutz almost 6 years ago

  • Status changed from In Progress to Resolved

Applied in changeset arvados|commit:f0fe7273c1851cb93e9edd58c0b60d3590b222ed.

Also available in: Atom PDF