Feature #5011

[SDKs] arv-put accepts --replication argument (saved with collection record)

Added by Tom Clegg over 5 years ago. Updated over 5 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
SDKs
Target version:
Start date:
01/30/2015
Due date:
% Done:

100%

Estimated time:
(Total: 2.00 h)
Story points:
0.5

Subtasks

Task #5121: Add replication=X kwarg to CollectionWriter and friendsResolvedTom Clegg

Task #5120: Add --replication arg to arv-putResolvedTom Clegg

Task #5122: Update docsResolvedTom Clegg

Task #5123: Review 5011-arv-put-replicationResolvedTom Clegg

Task #5124: Use server-provided default replication level as default in arv-put and CollectionWriter classesResolvedTom Clegg

Task #5206: Review 5011-thread-safe-testResolvedTom Clegg


Related issues

Blocked by Arvados - Story #3410: [API] Rename collection "desired replication" attributes, add model constraintsResolved07/29/2014

Associated revisions

Revision 64c70939
Added by Tom Clegg over 5 years ago

Merge branch '5011-arv-put-replication' closes #5011

Revision e1999050
Added by Tom Clegg over 5 years ago

Merge branch '5011-thread-safe-test' refs #5011

Revision d87717b4
Added by Tom Clegg over 5 years ago

Merge branch '3410-replication-attrs' closes #3410 refs #5011

History

#1 Updated by Brett Smith over 5 years ago

  • Subject changed from [SDKs] arv-get accepts --replication argument (saved with collection record) to [SDKs] arv-put accepts --replication argument (saved with collection record)

#2 Updated by Ward Vandewege over 5 years ago

  • Target version changed from Arvados Future Sprints to 2015-02-18 sprint

#3 Updated by Tom Clegg over 5 years ago

  • Category set to SDKs
  • Assigned To set to Tom Clegg

#4 Updated by Brett Smith over 5 years ago

Reviewing 655b69e. Unfortunately, I got these test failures:

======================================================================
FAIL: test_ArvPutSignedManifest (tests.test_arv_put.ArvPutIntegrationTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/brett/repos/arvados/sdk/python/tests/test_arv_put.py", line 507, in test_ArvPutSignedManifest
    self.assertEqual(p.returncode, 0)
AssertionError: 1 != 0

======================================================================
FAIL: test_put_collection_with_default_redundancy (tests.test_arv_put.ArvPutIntegrationTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/brett/repos/arvados/sdk/python/tests/test_arv_put.py", line 539, in test_put_collection_with_default_redundancy
    collection = self.run_and_find_collection("", [])
  File "/home/brett/repos/arvados/sdk/python/tests/test_arv_put.py", line 528, in run_and_find_collection
    self.assertEqual(1, len(collection_list))
AssertionError: 1 != 0

======================================================================
FAIL: test_put_collection_with_high_redundancy (tests.test_arv_put.ArvPutIntegrationTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/brett/repos/arvados/sdk/python/tests/test_arv_put.py", line 535, in test_put_collection_with_high_redundancy
    collection = self.run_and_find_collection("", ['--replication', '4'])
  File "/home/brett/repos/arvados/sdk/python/tests/test_arv_put.py", line 528, in run_and_find_collection
    self.assertEqual(1, len(collection_list))
AssertionError: 1 != 0

======================================================================
FAIL: test_put_collection_with_name_and_no_project (tests.test_arv_put.ArvPutIntegrationTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/brett/repos/arvados/sdk/python/tests/test_arv_put.py", line 553, in test_put_collection_with_name_and_no_project
    ['--portable-data-hash', '--name', link_name])
  File "/home/brett/repos/arvados/sdk/python/tests/test_arv_put.py", line 528, in run_and_find_collection
    self.assertEqual(1, len(collection_list))
AssertionError: 1 != 0

======================================================================
FAIL: test_put_collection_with_named_project_link (tests.test_arv_put.ArvPutIntegrationTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/brett/repos/arvados/sdk/python/tests/test_arv_put.py", line 563, in test_put_collection_with_named_project_link
    '--project-uuid', self.PROJECT_UUID])
  File "/home/brett/repos/arvados/sdk/python/tests/test_arv_put.py", line 528, in run_and_find_collection
    self.assertEqual(1, len(collection_list))
AssertionError: 1 != 0

======================================================================
FAIL: test_put_collection_with_unnamed_project_link (tests.test_arv_put.ArvPutIntegrationTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/brett/repos/arvados/sdk/python/tests/test_arv_put.py", line 544, in test_put_collection_with_unnamed_project_link
    ['--portable-data-hash', '--project-uuid', self.PROJECT_UUID])
  File "/home/brett/repos/arvados/sdk/python/tests/test_arv_put.py", line 528, in run_and_find_collection
    self.assertEqual(1, len(collection_list))
AssertionError: 1 != 0

----------------------------------------------------------------------

This showed up in the run, which seems like a likely culprit:

Traceback (most recent call last):
  File "/home/brett/repos/arvados/sdk/python/arvados/commands/put.py", line 518, in <module>
    main()
  File "/home/brett/repos/arvados/sdk/python/arvados/commands/put.py", line 483, in main
    'redundancy': replication,
NameError: global name 'replication' is not defined

Beyond that:

  • I think it would be nicer to use None as a default value for replication in both for CollectionWriter and --replication (i.e., don't set a default). None more clearly signals to users "we'll figure out our own default value," bringing it in line with the documentation. It will also help expose bugs in case we ever fail to set that default: later code that might work with 0 will probably raise an exception with None.
  • It looks like we should just have the signatures for local_store_put and local_store_get match put and get, respectively. This means copies should go in front of num_retries in local_store_put, and local_store_get shouldn't have a copies argument (that makes no sense). It would be extra-nice to update the comment above local_store_put to talk about this general rule.
  • Do you want to go all the way, and write a proper docstring for CollectionWriter.finish() from the comment?
  • I thought disk_services_available looked awfully familiar—there's a lot of overlap between setup methods in CollectionTestMixin and KeepClientServiceTestCase. Is it reasonable to DRY this up, perhaps in arvados_testutil?

Thanks.

#5 Updated by Tom Clegg over 5 years ago

Brett Smith wrote:

Reviewing 655b69e. Unfortunately, I got these test failures:

[...]

This showed up in the run, which seems like a likely culprit:

[...]

My bad, unpushed commits. :(

  • I think it would be nicer to use None as a default value for replication in both for CollectionWriter and --replication (i.e., don't set a default). None more clearly signals to users "we'll figure out our own default value," bringing it in line with the documentation. It will also help expose bugs in case we ever fail to set that default: later code that might work with 0 will probably raise an exception with None.

Done.

  • It looks like we should just have the signatures for local_store_put and local_store_get match put and get, respectively. This means copies should go in front of num_retries in local_store_put, and local_store_get shouldn't have a copies argument (that makes no sense). It would be extra-nice to update the comment above local_store_put to talk about this general rule.

Done. (And moved into docstring.)

  • Do you want to go all the way, and write a proper docstring for CollectionWriter.finish() from the comment?

Sure thing, done.

  • I thought disk_services_available looked awfully familiar—there's a lot of overlap between setup methods in CollectionTestMixin and KeepClientServiceTestCase. Is it reasonable to DRY this up, perhaps in arvados_testutil?

Done, in a new arvados_testutil.ApiClientMock class.

Now at cbf80c0

#6 Updated by Brett Smith over 5 years ago

Reviewing cbf80c0. Thank you very much for DRYing up the test infrastructure. This is much nicer.

  • Please add unit tests for replication_desired. At least one for case where redundancy is defined, and another when it isn't.
  • The comment at the top of Collection.attributes_required_columns would benefit from being made more general, since the code is too.
  • Please add replication_desired to the API schema documentation. It would be ideal to add a note or two to explain how it relates to the redundancy attributes, too, mentioning what's deprecated and so on.
  • I feel like the CollectionWriter.finish docstring is a little too scary. Yes, users should be aware that it doesn't make an Arvados collection object, but using this method to save task output is (currently) normal and expected. Suggest documenting the distinction but back off language like "beware" and "special cases." As a suggestion: "This is primarily used to save and record task output. In all other cases, you should make a collection instead…"
  • When you instantiate a KeepClient with a local store, it actually switches out its own put and get methods for the local_store versions. The docstrings might paint a fuller picture by describing this relationship rather than saying it's just "for use in test cases;" the __init__ docstring already points out that using a local store is primarily for testing.
  • Please make docstring formatting follow PEP 257. In particular, one-line docstrings should get closing quotes on the same line, while multiline docstrings should end with closing quotes on a line by themselves, with no blank lines on either side.

Thanks again.

#7 Updated by Tom Clegg over 5 years ago

Brett Smith wrote:

  • Please add unit tests for replication_desired. At least one for case where redundancy is defined, and another when it isn't.

Added.

Also added functional tests to protect/document Python SDK's usage.

  • The comment at the top of Collection.attributes_required_columns would benefit from being made more general, since the code is too.

Updated comments.

  • Please add replication_desired to the API schema documentation. It would be ideal to add a note or two to explain how it relates to the redundancy attributes, too, mentioning what's deprecated and so on.

Can we leave this for #3410?

Meanwhile I realized that forward compatibility in the SDK (for a few days) would be better than backward compatibility in the API (for longer than that), so I did this

            replication_attr = 'replication_desired'
            if api_client._schema.schemas['Collection']['properties'].get(replication_attr, None) is None:
                # API calls it 'redundancy' until #3410.
                replication_attr = 'redundancy'
  • I feel like the CollectionWriter.finish docstring is a little too scary. Yes, users should be aware that it doesn't make an Arvados collection object, but using this method to save task output is (currently) normal and expected. Suggest documenting the distinction but back off language like "beware" and "special cases." As a suggestion: "This is primarily used to save and record task output. In all other cases, you should make a collection instead…"

Updated. I left out "all" from "all other cases", though. Fear of absolutes.

  • When you instantiate a KeepClient with a local store, it actually switches out its own put and get methods for the local_store versions. The docstrings might paint a fuller picture by describing this relationship rather than saying it's just "for use in test cases;" the __init__ docstring already points out that using a local store is primarily for testing.

Updated:

-        """A stub for put(), for use in test cases.
+        """A stub for put().
+
+        This method is used in place of the real put() method in a
+        KeepClient constructed with local_store=True.

  • Please make docstring formatting follow PEP 257. In particular, one-line docstrings should get closing quotes on the same line, while multiline docstrings should end with closing quotes on a line by themselves, with no blank lines on either side.

Fascinating, sort of. PEP-257 says "Blank lines should be removed from the beginning and end of the docstring." But this is in a paragraph describing what processing tools should do, not(?) code authors. Emacs python-mode defaults to ending with a blank line (which it calls "pep-257" style). But now I found (setq python-fill-docstring-style 'pep-257-nn) and stuck it in ~/.emacs and Coding Standards so I don't have to [fail to] remember to remove that trailing blank line after every M-q. Phew.

Now at 71a556d

#8 Updated by Brett Smith over 5 years ago

Tom Clegg wrote:

  • Please add replication_desired to the API schema documentation. It would be ideal to add a note or two to explain how it relates to the redundancy attributes, too, mentioning what's deprecated and so on.

Can we leave this for #3410?

Sure.

Meanwhile I realized that forward compatibility in the SDK (for a few days) would be better than backward compatibility in the API (for longer than that), so I did this

  • It would be simpler to write if replication_attr not in api_client._schema.schemas['Collection']['properties'].
  • We're already doing this elsewhere, but I'll point out that _schema is pretty clearly marked as being not public. It's handy, but every time we do this, we risk making more work for ourselves to keep up with future googleapiclient updates. (No, I don't see a public interface to achieve the same goal.)
  • When you instantiate a KeepClient with a local store, it actually switches out its own put and get methods for the local_store versions. The docstrings might paint a fuller picture by describing this relationship rather than saying it's just "for use in test cases;" the __init__ docstring already points out that using a local store is primarily for testing.

Updated:

The local_store argument is expected to be a directory path string, not a boolean.

  • Please make docstring formatting follow PEP 257. In particular, one-line docstrings should get closing quotes on the same line, while multiline docstrings should end with closing quotes on a line by themselves, with no blank lines on either side.

Fascinating, sort of. PEP-257 says "Blank lines should be removed from the beginning and end of the docstring." But this is in a paragraph describing what processing tools should do, not(?) code authors.

You're right, sorry about that. I think I sort of assumed this from the example multi-line docstring.

My points here are all comparatively minor, so I think you're good to merge with or without addressing them. Thanks.

#9 Updated by Anonymous over 5 years ago

  • Status changed from New to Resolved
  • % Done changed from 60 to 100

Applied in changeset arvados|commit:64c70939c414881de61ac65512701d0ba4068786.

#10 Updated by Tom Clegg over 5 years ago

  • Status changed from Resolved to In Progress

#11 Updated by Brett Smith over 5 years ago

Reviewing 5011-thread-safe-test at 3068c62

Instead of writing our own threadsafe_iter, would it be possible to just stuff the mock responses in a Queue.Queue, and then set the mock's side_effect to that_queue.get?

#12 Updated by Tom Clegg over 5 years ago

Brett Smith wrote:

Reviewing 5011-thread-safe-test at 3068c62

Instead of writing our own threadsafe_iter, would it be possible to just stuff the mock responses in a Queue.Queue, and then set the mock's side_effect to that_queue.get?

Yes! In fact, I started out trying to use a Queue, but it didn't work out, and the iter seemed simple enough. But I tried Queue again and it worked better this time.

Now, in 50df495:
  •     queue = Queue.Queue()
        for val in items:
            queue.put(val)
        return lambda *args, **kwargs: queue.get(block=False)
    

Now at d5809a1

#13 Updated by Brett Smith over 5 years ago

Tom Clegg wrote:

Brett Smith wrote:

Instead of writing our own threadsafe_iter, would it be possible to just stuff the mock responses in a Queue.Queue, and then set the mock's side_effect to that_queue.get?

Yes! In fact, I started out trying to use a Queue, but it didn't work out, and the iter seemed simple enough. But I tried Queue again and it worked better this time.

Sweet, thank you for indulging me. I agree threadsafe_iter was pretty straightforward, but still, less code is less code. d5809a1e is good to merge, thanks.

#14 Updated by Tom Clegg over 5 years ago

  • Status changed from In Progress to Resolved

Also available in: Atom PDF