Idea #3410
closed[API] Rename collection "desired replication" attributes, add model constraints
Description
Each collection record should have a field corresponding to the level of replication that the owner desires.
It's okay to leave this field blank, in which case the default value will be assumed.
The default replication level for collections should be a systemwide default which the api server can return.
Implementation:- Rename redundancy -> replication_desired
- Rename redundancy_confirmed -> replication_confirmed
- Rename redundancy_confirmed_at -> replication_confirmed_at
- Drop redundancy_confirmed_by_client_uuid column
- Delete redundancy_status method from models/collection.rb
- Prevent non-privileged users from updating replication_confirmed* columns
- When a collection's manifest_text changes in a way that introduces new block hashes (that weren't in the old manifest_text), reset redundancy_confirmed* columns to NULL
- Add a config entry in services/api/config/application.default.yml,
default_collection_replication: 2
- Expose the config entry in the discovery document (as defaultCollectionReplication at top level?)
Updated by Ward Vandewege over 10 years ago
- Target version set to 2014-08-27 Sprint
Updated by Ward Vandewege over 10 years ago
- Subject changed from Store desired replication level in collection to [API] Store desired replication level in collection
Updated by Tom Clegg over 10 years ago
- Target version changed from 2014-08-27 Sprint to Arvados Future Sprints
Updated by Tom Clegg over 10 years ago
- Subject changed from [API] Store desired replication level in collection to [API] Store desired replication level in collections table
- Description updated (diff)
- Assigned To deleted (
Peter Amstutz)
Updated by Ward Vandewege almost 10 years ago
- Target version changed from Arvados Future Sprints to 2015-02-18 sprint
Updated by Peter Amstutz almost 10 years ago
What's the difference between redundancy_confirmed
and redundancy_confirmed_at
? If redundancy_confirmed
is just a boolean, then only redundancy_confirmed_at
is needed?
Updated by Tom Clegg almost 10 years ago
- Subject changed from [API] Store desired replication level in collections table to [API] Rename collection "desired replication" attributes, add model constraints
Updated by Tom Clegg almost 10 years ago
Peter Amstutz wrote:
What's the difference between
redundancy_confirmed
andredundancy_confirmed_at
? Ifredundancy_confirmed
is just a boolean, then onlyredundancy_confirmed_at
is needed?
replication_confirmed
is the number of copies.
Updated by Tom Clegg almost 10 years ago
- Assigned To changed from Peter Amstutz to Tom Clegg
Updated by Radhika Chippada almost 10 years ago
- I am understanding this implementation as follows. For each collection, replication_desired can be set (by admin); or the default will be used. Whatever is this value (default or per collection), we will initiate the replication process when the replication_confirmed != replication_desired. My comments are based on this understanding. Please correct if this is not the case.
- I think the name “replication_current” might have been clearer over “replication_confirmed”.
- when replication_desired is changed, do we need to reset replication_confirmed and replication_confirmed_at? I am thinking this can be set by admin after the initial collection creation and replication_confirmed etc are completed. Or can be updated after setting to a value previously. In either case, I am assuming we would need to replicate additional copies or remove replications as needed. Does this get initiated because replication_desired!= replication_confirmed or do we need to reset these attributes to initiate the replication process?
- maybe_clear_redundancy_confirmed - should this be using the word “replication” in place of “redundancy”?
- ensure_permission_to_save method
Why do we want non-admin users be able to clear replication_confirmed* when they cannot set them? Should the following be:
if (not current_user.andand.is_admin and (replication_confirmed_at_changed? or replication_confirmed_changed?) and not (replication_confirmed_at.nil? and replication_confirmed.nil?)) this: if (not current_user.andand.is_admin and (replication_confirmed_at_changed? or replication_confirmed_changed?))
- config/application.default.yml
can we update comment — “Set replication level for collections whose replication_desired attribute is nil” => “Default replication level for collections. This will be used when replication_desired attribute is nil.”
- migration script -> comment in up method
“Removing that column dropped the index. Let's put it back” => can you please say “search index” instead of “index”?
- test "replication_desired reports #{ask or 'default'} if redundancy is #{ask}" — should this be using the word “redundancy” in here? I think it should be "replication_desired reports #{ask or 'default'} if desired is #{ask}”?
- test "replication_confirmed* cannot be set by non-admin user" do
Can you update this comment: “# Cannot set both at once.” => “Cannot set both at once either.”
- test “don't clear replication_confirmed* when just deleting a data block” seems to have done what you intended. Verified as below in the test:
Initial manifest: . acbd18db4cc2f85cedef654fccc4a4d8+3 37b51d194a7513e45b56f6524f2d51f2+3 0:3:foo 3:6:bar Updated manifest: . 37b51d194a7513e45b56f6524f2d51f2+3 0:3:foo
- In the fixtures, can we make the names more readable? ex:
- “replication wantnull havenull” => “replication want-null have-null”
- “replication want2 have2” => “replication want-2 have-2” or “replication want 2 and have 2”
Updated by Tom Clegg almost 10 years ago
Radhika Chippada wrote:
- I am understanding this implementation as follows. For each collection, replication_desired can be set (by admin); or the default will be used. Whatever is this value (default or per collection), we will initiate the replication process when the replication_confirmed != replication_desired. My comments are based on this understanding. Please correct if this is not the case.
The gist of it is that the user specifies a desired replication level, while the data manager gives feedback about the current replication state as often as practicable, and makes additional copies whenever needed. This way the user can check whether the desired level has been achieved, detect when replication falls (even temporarily) below the desired level, and be assured that the data manager process isn't simply stopped or broken.
Given that storage devices and nodes can fail (and return to service) without notice, an assurance about the current replication level is somewhere between impractical and unbelievable. However, a claim about the most recent verification event is practical and believable.
Likewise, the process of comparing desired and actual states has to be done regularly whether or not anyone changes replication_desired on any collections. Data manager could prioritize cases where the latest confirmed level was lower than the current desired level (or was null), though.
- I think the name “replication_current” might have been clearer over “replication_confirmed”.
I think "current" implies (incorrectly, see above) that the value represents the state at the time of the collections.get
request. Also, "confirmed" highlights the distinction between "specified/desired" (which is always "current") and a confirmation event (which happened in the past). I'm certainly willing to consider other terms ("verified"?), but I do think whatever we choose should match the timestamp field in an obvious way ("replication_current_at"
doesn't quite sound right).
- when replication_desired is changed, do we need to reset replication_confirmed and replication_confirmed_at? I am thinking this can be set by admin after the initial collection creation and replication_confirmed etc are completed. Or can be updated after setting to a value previously. In either case, I am assuming we would need to replicate additional copies or remove replications as needed. Does this get initiated because replication_desired!= replication_confirmed or do we need to reset these attributes to initiate the replication process?
I think if a user increases replication_desired from 2 to 3, the information about when replication was last confirmed (and whether it was confirmed to be 1 or 2) is still correct, and still just as useful.
It is possible (but not certain) that additional copies of data blocks will be made in order to bring actual replication from 2 to 3.
- maybe_clear_redundancy_confirmed - should this be using the word “replication” in place of “redundancy”?
Ah, yes. Fixed in 08c575d.
- ensure_permission_to_save method
Why do we want non-admin users be able to clear replication_confirmed* when they cannot set them? Should the following be:
[...]
I figured this restriction would amount to an inconvenience at most: a non-admin user can achieve the same result by adding an arbitrary data block to the manifest (causing the confirmed
fields to reset to nil) and then removing it.
- config/application.default.yml
can we update comment — “Set replication level for collections whose replication_desired attribute is nil” => “Default replication level for collections. This will be used when replication_desired attribute is nil.”
Done.
- migration script -> comment in up method
“Removing that column dropped the index. Let's put it back” => can you please say “search index” instead of “index”?
Done.
- test "replication_desired reports #{ask or 'default'} if redundancy is #{ask}" — should this be using the word “redundancy” in here? I think it should be "replication_desired reports #{ask or 'default'} if desired is #{ask}”?
Hm... I already did this in d62b733, which adds a bunch of other tests too. Which version were you looking at?
- test "replication_confirmed* cannot be set by non-admin user" do
Can you update this comment: “# Cannot set both at once.” => “Cannot set both at once either.”
Done.
- test “don't clear replication_confirmed* when just deleting a data block” seems to have done what you intended. Verified as below in the test:
[...]
I don't quite follow you here. I agree that it seems to have done what I intended. (Hopefully it's not unique in this respect?)
- In the fixtures, can we make the names more readable? ex:
- “replication wantnull havenull” => “replication want-null have-null”
- “replication want2 have2” => “replication want-2 have-2” or “replication want 2 and have 2”
Changed to want=2, etc.
Now at 1f7a6b5
Updated by Radhika Chippada almost 10 years ago
- I understand replication strategy better after your first response above. Also agree with your comments about "confirmed" versus "current."
- I was looking at d302307. And, I missed the sdk updates due to this. A couple more comments now that I looked at them (below).
- commands/put.py
- This comment is still a bit confusing to me.
“args.replication is how many copies we instruct Arvados to maintain (by passing it in collections().create() after all data is written).” Are you saying “args.replication is how many copies we instruct Arvados to maintain after all data is written (by passing it in collections().create()).”? - This comment can be a bit more elaborate?
“If args.replication is given as None, it should stay None, but we still need to write a suitable number of copies to Keep.” Something like “If args.replication is given as None, replication_desired would be none. However, we will write a default number of copies to Keep …” ?
- This comment is still a bit confusing to me.
- Is this still staying in put.py?
if api_client._schema.schemas['Collection']['properties'].get(replication_attr, None) is None: # API calls it 'redundancy' until #3410.
replication_attr = 'redundancy'
- Regarding test “don't clear replication_confirmed* when just deleting a data block”
You had a comment in the test “We really deleted a block there, right?” I was responding to that comment with my observation that the test is indeed doing what it intends to do.
- I did notice that you had not yet merged latest master (and full text search). I am glad you merged and took care of updating that index as well.
- ensure_permission_to_save
Ah. I see the problem now. If we do what I suggested the scenario "clear replication_confirmed* when introducing a new block in manifest" fails which is not acceptable.
- I noticed this issue with the db migration script. When we rollback, the redundancy_confirmed_by_client_uuid is added back and is the last column in the table. Hence the search_index needs to list it as the last column.
[ 24/260] ArvadosModelTest#test_search_index_exists_on_models_that_go_into_projects = 0.05 s 1) Failure: ArvadosModelTest#test_search_index_exists_on_models_that_go_into_projects [/home/radhika/arvados/services/api/test/unit/arvados_model_test.rb:144]: collections has no search index with columns ["owner_uuid", "modified_by_client_uuid", "modified_by_user_uuid", "portable_data_hash", "uuid", "name", "file_names", "redundancy_confirmed_by_client_uuid"]. Instead found search index with columns ["owner_uuid", "modified_by_client_uuid", "modified_by_user_uuid", "portable_data_hash", "redundancy_confirmed_by_client_uuid", "uuid", "name", "file_names"]
Updated by Tom Clegg almost 10 years ago
Radhika Chippada wrote:
“args.replication is how many copies we instruct Arvados to maintain (by passing it in collections().create() after all data is written).” Are you saying “args.replication is how many copies we instruct Arvados to maintain after all data is written (by passing it in collections().create()).”?
Changed to:
# write_copies diverges from args.replication here.
# args.replication is how many copies we will instruct Arvados to
# maintain (by passing it in collections().create()) after all
# data is written -- and if None was given, we'll use None there.
# Meanwhile, write_copies is how many copies of each data block we
# write to Keep, which has to be a number.
#
# If we simply changed args.replication from None to a default
# here, we'd end up erroneously passing the default replication
# level (instead of None) to collections().create().
- Is this still staying in put.py?
if api_client._schema.schemas['Collection']['properties'].get(replication_attr, None) is None:# API calls it 'redundancy' until #3410. replication_attr = 'redundancy'
Yes, at least for a little while: it allows new Python clients to work against old API servers, e.g., a compute node or shell VM updates to this version before its API server does. I updated the comment from "calls" to "called", though, since this is #3410...
- Regarding test “don't clear replication_confirmed* when just deleting a data block”
You had a comment in the test “We really deleted a block there, right?” I was responding to that comment with my observation that the test is indeed doing what it intends to do.
Ah, I see. I meant it as "this is what the next line does" but it can just as well scan as a question to amuse the reader. I've rephrased it:
- # We really deleted a block there, right?
+
+ # Confirm that we did just remove a block from the manifest (if
+ # not, this test would pass without testing the relevant case):
- I noticed this issue with the db migration script. When we rollback, the redundancy_confirmed_by_client_uuid is added back and is the last column in the table. Hence the search_index needs to list it as the last column.
[...]
Indeed! Updated the index. Also updated the test to be less sensitive, since (AFAIK) it doesn't actually matter whether the columns are given in the same order in the table and the index -- only that the set of columns is correct. Right?
Now at 11e1ee (!)
Updated by Tom Clegg almost 10 years ago
- Status changed from New to In Progress
Updated by Anonymous almost 10 years ago
- Status changed from In Progress to Resolved
- % Done changed from 67 to 100
Applied in changeset arvados|commit:d87717b4ec885059183ef6d7fa6780c343338455.