Feature #4253
closed[API] Users can create their own arvados-hosted git repositories
Description
Summary: Similar to github. Each user gets one username, and owns the {git}/username/* namespace.
Assigning usernames¶
Add "username" attribute to users table.- Can be null.
- Unique.
- Starts with alpha. Contains only alphanum.
- Only admin can change. (Could change in future, but simplifies for now.)
- Migration: for active users, copy VM login name if there is one, otherwise assign one based on email address. For inactive users, leave null.
- Add "username" to user profile setup page. Allow user to change own username from null to non-null. Stay on profile page until a unique username is successfully saved.
- Unique across all federated sites.
Repositories¶
Repository names- Repository with name=bar, owned by user with username=foo, is "{base}/foo/bar.git"
- name cannot be null.
- {owner_uuid, name} is unique.
- name starts with alpha, contains only alphanum.
- Migrate existing repositories to owner_uuid=system_user_uuid. These will continue to be available at "{base}/foo.git".
- Repository table "name" column has "foo/bar" (model needs to ensure the "foo" part stays synchronized with owner_uuid->username)
User can create a repository with owner_uuid==current_user.uuid but not with any other owner_uuid (unless current_user.is_admin)Owner_uuid can be set/changed to any user writable by current_user, which is only current_user itself for a non-admin in a typical setup. This part is the same as standard permission behavior.- owner_uuid must refer to a user, not a group (even if current_user.is_admin)
- Permission summary provided to gitolite grants repository owner RW+ permission
- Export repository names as "username/reponame" (assuming this is convenient for gitolite permission-sync script).
- Update gitolite permission-sync script.
- On manage account page (at least for now)
- Add new repository -> dialog box (or inline, if easier)
- Show note that repositories cannot be renamed
- Show note that it will take a minute or two before "git clone" will work (aside/future: can Workbench get feedback from the git system somehow so it can say "pending" on new repos?)
- (Probably) future: Delete repository -> confirm with alert box (cannot be undone). Depends on gitolite moving the old repo out of the way in case a new one is created with the same name.
- Can we store repositories themselves by uuid instead of name? Perhaps symlink names to uuids. This could help with "rename" and "delete" (e.g., delete any symlinks that aren't mentioned in the current permission list; never delete actual git repos).
Other stuff to test¶
- Submit jobs & pipelines using repository="username/reponame"
Open questions¶
- Any difficult bits on the gitolite/git-daemon side?
Updated by Tim Pierce about 10 years ago
- Subject changed from ordinary users should be able to create their own repositories to [API] ordinary users should be able to create their own repositories
Updated by Tom Clegg almost 10 years ago
- Target version changed from Arvados Future Sprints to 2015-03-11 sprint
Updated by Tom Clegg almost 10 years ago
- Subject changed from [API] ordinary users should be able to create their own repositories to [DRAFT] [API] ordinary users should be able to create their own repositories
- Story points changed from 1.0 to 2.0
Updated by Tom Clegg almost 10 years ago
- Subject changed from [DRAFT] [API] ordinary users should be able to create their own repositories to [API] ordinary users should be able to create their own repositories
- Description updated (diff)
Updated by Tom Clegg almost 10 years ago
- Subject changed from [API] ordinary users should be able to create their own repositories to [API] Users can create their own arvados-hosted git repositories
Updated by Tom Clegg almost 10 years ago
- Assigned To deleted (
Tom Clegg) - Target version changed from 2015-03-11 sprint to 2015-04-01 sprint
Updated by Brett Smith almost 10 years ago
- Status changed from New to In Progress
Updated by Brett Smith almost 10 years ago
- Description updated (diff)
(10:28:02 AM) Me: Real quick, on 4253: how do you feel about generating usernames from VM login permissions rather than repository names?
(10:28:31 AM) Me: If the user can manage multiple repositories, there's no way to tell which one is their personal one, so trying to disambiguate could be… some code.
(10:28:44 AM) Tom Clegg: that's true of VM login permissions too
(10:29:12 AM) Tom Clegg: but yeah, whichever way is easier (given the actual data in our systems) sounds great
(10:29:33 AM) Me: Yeah, that's my feeling.
(10:29:55 AM) Me: For VMs, even if there are multiple permissions, on our existing systems they usually have the same login name, so we can figure that's the name to use.
(10:30:06 AM) Me: And if we just punt completely in case of conflict, it'll affect fewer people.
(10:30:39 AM) Tom Clegg: that sounds good
Updated by Brett Smith almost 10 years ago
4253-user-usernames-wip is up for review. It implements the "Assign usernames" part of this story, and then uses that username for existing setup methods. Even though the exact repository setup logic will change again soon in a follow-up branch to implement the rest of the story, it was still good to get VM logins on the same page, and set a good baseline for future work.
Updated by Peter Amstutz almost 10 years ago
Disallowing dots (periods), dashes, and underscores in usernames seems a bit harsh. Is there a particular reasoning behind this beyond "because the story said so"?
A few more auto_setup_name_blacklist entries I would suggest: "admin", "administrator", "daemon", "sys", "mail", "postgres", "sshd"
I don't understand this:
pattern = "%s%s" % [quoted_name, "_" * suffix_len]
I'm confused. Won't this result in the pattern
basename_ basename__ basename___ basename____ basename_____ basename______
When the sequence of names is actually
basename1 basename2 basename3 basename4 basename5 basename6
It looks like you are doing something a bit more subtle here, but I don't entirely understand what it is. Something like "if there is already a user11, don't create user1, create user2 instead" ?
Shouldn't auto_setup_new_user
create the new repository so that it is owned by the user, instead of using can_manage
link? (Perhaps this change is happening as part of the rest of the story?)
FWIW, in order for the username to be unique across all federated sites, this logic will eventually have to be moved to the SSO server.
Your commit message has a note "Use the new generated username to avoid a weird disconnect between that and these related objects." That implies there should be a migration but I see nothing like that. Can you explain better what happens to old (now nonconforming) user names? (it appears that the main distinction is that old usernames permitted dot, dash and underscore.)
Updated by Brett Smith almost 10 years ago
Peter Amstutz wrote:
Disallowing dots (periods), dashes, and underscores in usernames seems a bit harsh. Is there a particular reasoning behind this beyond "because the story said so"?
"Because the story said so" isn't merely a sufficient reason to write code a certain way; it is the best reason. Our review documentation says, "When merging, both the developer and the reviewer should be convinced that… the code does what the story specified." If I decided unilaterally that more characters should be allowed than the story specifies, you should bust my chops for that.
If you don't like the design in the story, bringing that up during code review is almost the worst possible time. It means that I'm spending time answering questions that you really ought to be asking Tom. And if you convince folks that you're right, it means that I've wasted time writing code to a spec that we never actually wanted.
But to answer your real question: the story is specified this way in order to minimize the chances that we'll have incompatibilities with other git tools we might want to use in the future. It seems likely that most tools will allow some set of these punctuation characters, but very possibly reserve one or two of them. Until our git story is a little more developed in general, we want to keep our options open. I agree it's strict, but them's the breaks.
A few more auto_setup_name_blacklist entries I would suggest: "admin", "administrator", "daemon", "sys", "mail", "postgres", "sshd"
The default blacklist was created a while ago (during the automatic user setup story, I think) and I don't see any immediate need to revisit it in this story (I'm not touching it in this branch). Please file a separate ticket for this.
I don't understand this:
pattern = "%s%s" % [quoted_name, "_" * suffix_len]I'm confused. Won't this result in the pattern
[...]
Yes. We use those patterns to search for a free username without doing an SQL query for each username individually. When the pattern is username_
, we're searching username1 through username9. If the loop finds that one of those is free, it stops and returns that name. Otherwise, the pattern becomes username__
to search username10 through username99, and so on.
The code uses these constrained pattern to effectively force the numeric part after username
to be sorted numerically. If we searched for username%
, then username10
would be listed before username2
and complicate the search.
Shouldn't
auto_setup_new_user
create the new repository so that it is owned by the user, instead of usingcan_manage
link? (Perhaps this change is happening as part of the rest of the story?)
I have that change in a later branch, yes. This branch is changing as little as necessary in the auto-setup logic.
Your commit message has a note "Use the new generated username to avoid a weird disconnect between that and these related objects." That implies there should be a migration but I see nothing like that. Can you explain better what happens to old (now nonconforming) user names? (it appears that the main distinction is that old usernames permitted dot, dash and underscore.)
The old username creation logic had very different rules: it would look at the username part of the user's e-mail address. If that was not an acceptable username (which had a broader definition, as you've noticed), it would refuse to do auto-setup. If it was OK, then the name was used in two places: the name of the user's git repository, and their username for VM login. The disconnect I was trying to address was having these two sets of rules; it seemed better to follow one set of username rules and use it for everything.
For existing users, their repositories will be migrated in a later branch following the rules under "Repository names." The VM logins can stay as they are: they don't have the same compatibility concerns discussed above, and avoiding messing with users' client configuration seems better than having perfect consistency.
Thanks.
Updated by Peter Amstutz almost 10 years ago
I'm getting an error running the migration. Maybe what's happening is that after ActiveRecord::RecordNotUnique is raised and rescued, the transaction is no longer valid?
$ rake db:migrate == AddUsernameToUsers: migrating ============================================= -- add_column(:users, :username, :string, {:null=>true}) -> 0.0039s -- remove_index(:users, {:name=>"users_search_index"}) -> 0.0048s rake aborted! StandardError: An error has occurred, this and all later migrations canceled: PG::InFailedSqlTransaction: ERROR: current transaction is aborted, commands ignored until end of transaction block : UPDATE "users" SET "username" = 'tom2', "updated_at" = '2015-03-25 13:40:44.285749' WHERE "users"."id" = 76/home/peter/.rvm/gems/ruby-2.1.1/gems/activerecord-3.2.17/lib/active_record/connection_adapters/postgresql_adapter.rb:1163:in `async_exec' /home/peter/.rvm/gems/ruby-2.1.1/gems/activerecord-3.2.17/lib/active_record/connection_adapters/postgresql_adapter.rb:1163:in `exec_no_cache' /home/peter/.rvm/gems/ruby-2.1.1/gems/activerecord-3.2.17/lib/active_record/connection_adapters/postgresql_adapter.rb:671:in `block in exec_delete' /home/peter/.rvm/gems/ruby-2.1.1/gems/activerecord-3.2.17/lib/active_record/connection_adapters/abstract_adapter.rb:280:in `block in log' /home/peter/.rvm/gems/ruby-2.1.1/gems/activesupport-3.2.17/lib/active_support/notifications/instrumenter.rb:20:in `instrument' /home/peter/.rvm/gems/ruby-2.1.1/gems/activerecord-3.2.17/lib/active_record/connection_adapters/abstract_adapter.rb:275:in `log' /home/peter/.rvm/gems/ruby-2.1.1/gems/activerecord-3.2.17/lib/active_record/connection_adapters/postgresql_adapter.rb:670:in `exec_delete' /home/peter/.rvm/gems/ruby-2.1.1/gems/activerecord-3.2.17/lib/active_record/connection_adapters/abstract/database_statements.rb:96:in `update' /home/peter/.rvm/gems/ruby-2.1.1/gems/activerecord-3.2.17/lib/active_record/connection_adapters/abstract/query_cache.rb:14:in `update' /home/peter/.rvm/gems/ruby-2.1.1/gems/activerecord-3.2.17/lib/active_record/persistence.rb:359:in `update' /home/peter/.rvm/gems/ruby-2.1.1/gems/activerecord-3.2.17/lib/active_record/locking/optimistic.rb:68:in `update' /home/peter/.rvm/gems/ruby-2.1.1/gems/activerecord-3.2.17/lib/active_record/attribute_methods/dirty.rb:74:in `update' /home/peter/.rvm/gems/ruby-2.1.1/gems/activerecord-3.2.17/lib/active_record/timestamp.rb:71:in `update' /home/peter/.rvm/gems/ruby-2.1.1/gems/activerecord-3.2.17/lib/active_record/callbacks.rb:272:in `block in update' /home/peter/.rvm/gems/ruby-2.1.1/gems/activesupport-3.2.17/lib/active_support/callbacks.rb:403:in `_run__2571989721655299207__update__3332503642953779417__callbacks' /home/peter/.rvm/gems/ruby-2.1.1/gems/activesupport-3.2.17/lib/active_support/callbacks.rb:405:in `__run_callback' /home/peter/.rvm/gems/ruby-2.1.1/gems/activesupport-3.2.17/lib/active_support/callbacks.rb:385:in `_run_update_callbacks' /home/peter/.rvm/gems/ruby-2.1.1/gems/activesupport-3.2.17/lib/active_support/callbacks.rb:81:in `run_callbacks' /home/peter/.rvm/gems/ruby-2.1.1/gems/activerecord-3.2.17/lib/active_record/callbacks.rb:272:in `update' /home/peter/.rvm/gems/ruby-2.1.1/gems/activerecord-3.2.17/lib/active_record/persistence.rb:348:in `create_or_update' /home/peter/.rvm/gems/ruby-2.1.1/gems/activerecord-3.2.17/lib/active_record/callbacks.rb:264:in `block in create_or_update' /home/peter/.rvm/gems/ruby-2.1.1/gems/activesupport-3.2.17/lib/active_support/callbacks.rb:403:in `_run__2571989721655299207__save__3332503642953779417__callbacks' /home/peter/.rvm/gems/ruby-2.1.1/gems/activesupport-3.2.17/lib/active_support/callbacks.rb:405:in `__run_callback' /home/peter/.rvm/gems/ruby-2.1.1/gems/activesupport-3.2.17/lib/active_support/callbacks.rb:385:in `_run_save_callbacks' /home/peter/.rvm/gems/ruby-2.1.1/gems/activesupport-3.2.17/lib/active_support/callbacks.rb:81:in `run_callbacks' /home/peter/.rvm/gems/ruby-2.1.1/gems/activerecord-3.2.17/lib/active_record/callbacks.rb:264:in `create_or_update' /home/peter/.rvm/gems/ruby-2.1.1/gems/activerecord-3.2.17/lib/active_record/persistence.rb:104:in `save!' /home/peter/.rvm/gems/ruby-2.1.1/gems/activerecord-3.2.17/lib/active_record/validations.rb:56:in `save!' /home/peter/.rvm/gems/ruby-2.1.1/gems/activerecord-3.2.17/lib/active_record/attribute_methods/dirty.rb:33:in `save!' /home/peter/.rvm/gems/ruby-2.1.1/gems/activerecord-3.2.17/lib/active_record/transactions.rb:264:in `block in save!' /home/peter/.rvm/gems/ruby-2.1.1/gems/activerecord-3.2.17/lib/active_record/transactions.rb:313:in `block in with_transaction_returning_status' /home/peter/.rvm/gems/ruby-2.1.1/gems/activerecord-3.2.17/lib/active_record/connection_adapters/abstract/database_statements.rb:192:in `transaction' /home/peter/.rvm/gems/ruby-2.1.1/gems/activerecord-3.2.17/lib/active_record/transactions.rb:208:in `transaction' /home/peter/.rvm/gems/ruby-2.1.1/gems/activerecord-3.2.17/lib/active_record/transactions.rb:311:in `with_transaction_returning_status' /home/peter/.rvm/gems/ruby-2.1.1/gems/activerecord-3.2.17/lib/active_record/transactions.rb:264:in `save!' /home/peter/work/arvados/services/api/db/migrate/20150317132720_add_username_to_users.rb:115:in `block (2 levels) in up' /home/peter/work/arvados/services/api/db/migrate/20150317132720_add_username_to_users.rb:95:in `block in each_wanted_username' /home/peter/work/arvados/services/api/db/migrate/20150317132720_add_username_to_users.rb:95:in `loop' /home/peter/work/arvados/services/api/db/migrate/20150317132720_add_username_to_users.rb:95:in `each_wanted_username' /home/peter/work/arvados/services/api/db/migrate/20150317132720_add_username_to_users.rb:112:in `block in up'
Updated by Brett Smith almost 10 years ago
Peter Amstutz wrote:
I'm getting an error running the migration. Maybe what's happening is that after ActiveRecord::RecordNotUnique is raised and rescued, the transaction is no longer valid?
Good catch, thanks. Moved the uniqueness check up to the Rails level. Give fa2abe9 a try.
Updated by Peter Amstutz almost 10 years ago
fa2abe9 of 4253-user-usernames-wip LGTM
Updated by Ward Vandewege almost 10 years ago
reviewing 4253-gitolite3-wip:
- is it necessary to call the 'gitolite setup' exec on every run?
otherwise, LGTM
Updated by Brett Smith almost 10 years ago
Ward Vandewege wrote:
reviewing 4253-gitolite3-wip:
- is it necessary to call the 'gitolite setup' exec on every run?
Yeah, I want to avoid this too. Per our discussion, the unless => '/usr/bin/cmp …'
line should ensure we only run the command when we install or replace the admin key. Thanks.
Updated by Ward Vandewege almost 10 years ago
reviewing 4253-user-git-repos-wip:
- arv = Arvados.new( { :suppress_ssl_warnings => false } )
Since you added code to respect ARVADOS_API_HOST_INSECURE, the 'suppress_ssl_warnings' flag should probably be removed?
- in class Repository, is CLASS_ADD_TO_GIT needed? Why not just use ADD_TO_GIT which is already available because TrackCommitState is included? Also CLASS_ADD_TO_GIT doesn't set @need_commit.
Other than that, LGTM
Updated by Brett Smith almost 10 years ago
Ward Vandewege wrote:
- arv = Arvados.new( { :suppress_ssl_warnings => false } )
Since you added code to respect ARVADOS_API_HOST_INSECURE, the 'suppress_ssl_warnings' flag should probably be removed?
The Arvados class always suppresses SSL warnings when ARVADOS_API_HOST_INSECURE is set.
- in class Repository, is CLASS_ADD_TO_GIT needed? Why not just use ADD_TO_GIT which is already available because TrackCommitState is included? Also CLASS_ADD_TO_GIT doesn't set @need_commit.
It wasn't that simple, because need_commit is an instance variable, and we had class methods that were adding stuff to git and didn't have access to that information. I have simplified this by promoting
@need_commit to a class variable and adding the logic to TrackCommitState to handle it. Now at c3cecdc.
Updated by Ward Vandewege almost 10 years ago
Brett Smith wrote:
Ward Vandewege wrote:
- arv = Arvados.new( { :suppress_ssl_warnings => false } )
Since you added code to respect ARVADOS_API_HOST_INSECURE, the 'suppress_ssl_warnings' flag should probably be removed?
The Arvados class always suppresses SSL warnings when ARVADOS_API_HOST_INSECURE is set.
Yes - the "{ :suppress_ssl_warnings => false }" is just leftover from the days before we had ARVADOS_API_HOST_INSECURE; I just had it in there so I could easily change it to 'true' when there was a need. It can go now, I think.
It wasn't that simple, because
need_commit is an instance variable, and we had class methods that were adding stuff to git and didn't have access to that information. I have simplified this by promoting
@need_commit to a class variable and adding the logic to TrackCommitState to handle it. Now at c3cecdc.
Great, thanks! A little sad to see the cute code with the anonymous proc go, but this looks good to merge.
Thanks,
Ward.
Updated by Brett Smith almost 10 years ago
4253-user-repos-wip is up for review. Following our Gitolite changes, this implements the necessary changes on the API server end. It does not include the Workbench management UI; that will come later. Users will at least have the option to create their own repositories with arv create repository
, although I'm hoping the Workbench UI will follow soon.
You DO NOT need to review the changes to .gitolite.rc and update-gitolite.rb in Docker. Those have already been reviewed in deployment (see above).
Updated by Brett Smith almost 10 years ago
4253-user-repos-wip now at 7ca7892 with the Workbench interface added.
Updated by Tom Clegg almost 10 years ago
- API: I had trouble making sense of the different code paths in "setup" that do different things with the given repo_name depending on whether the user already existed or got created during this API transaction. Rephrased into something that (I think) is clearer and does what we want.
- API: Tweak error messages for allowable repo names.
- Workbench: Add static suffix ".git" using styling similar to the static prefix "username/" just so the user doesn't have to guess whether to type that part.
- Workbench: Clear input & error state if "add repo" modal is closed and reopened.
- "non-admin can rename {username}/{reponame} repo" passes (I didn't notice a test for this)
- "non-admin can touch toplevel repo" fails and is needed by #5416
- "non-admin cannot rename toplevel repo" is worth protecting -- otherwise users with legacy repos can stomp on the toplevel namespace -- it passes now, but will need attention if we do what I think is the general bugfix here: permit changes to an object that has a non-writable owner_uuid when current_user has a can_manage permission link directly to the object, as long as the owner_uuid attribute itself hasn't changed.
(#5416 comes with its own repository permission test cases to protect the "non-admin can touch own repo" behavior, but I don't think it encounters this problem because the user doing the "touch" happens to be the owner.)
Does all this sound plausible?
Updated by Brett Smith almost 10 years ago
Thanks for the improvements. I have rebased on top of master to account for #5416 and squash some of your commits. Please reset hard and take another look.
Tom Clegg wrote:
- API: I had trouble making sense of the different code paths in "setup" that do different things with the given repo_name depending on whether the user already existed or got created during this API transaction. Rephrased into something that (I think) is clearer and does what we want.
Moving the username check is great. My thinking about splitting the / check is that, if we just created the user, there is no way the requester could've known the username ahead of time, so we should always add the username prefix to the repository—but if the user already existed, then maybe the requester specified the prefix in the repository name in the request, and we account for that.
I have restored this strictness by taking your version and tightening the condition when we accept the parameter verbatim: we only do so when we found the user and the repository name starts with the full prefix in c6bc1e96.
- Workbench: Add static suffix ".git" using styling similar to the static prefix "username/" just so the user doesn't have to guess whether to type that part.
I considered this and wavered on it. I agree with you about the upside. The downside is that .git
isn't actually stored in the name field as such, and I'm worried this might confuse people who try to do API searches on it later. But it's a trade-off, and if you like this it's cool by me.
Does all this sound plausible?
Yes, I have made 8c7fb5f to make these tests pass.
How's this version?
Updated by Tom Clegg almost 10 years ago
Brett Smith wrote:
Moving the username check is great. My thinking about splitting the / check is that, if we just created the user, there is no way the requester could've known the username ahead of time, so we should always add the username prefix to the repository—but if the user already existed, then maybe the requester specified the prefix in the repository name in the request, and we account for that.
I have restored this strictness by taking your version and tightening the condition when we accept the parameter verbatim: we only do so when we found the user and the repository name starts with the full prefix in c6bc1e96.
Indeed. I had missed that an admin could ask to create "f.oo" with "f.oo/bar" but end up creating "foo1" with "f.oo/bar", which would be bad. Thanks for fixing.
I considered this and wavered on it. I agree with you about the upside. The downside is that
.git
isn't actually stored in the name field as such, and I'm worried this might confuse people who try to do API searches on it later. But it's a trade-off, and if you like this it's cool by me.
Maybe greying it out with a "disabled" state would help, but at this point I say you should merge whichever version you feel like, and we'll get some feedback from users.
How's this version?
All LGTM, thanks!
Updated by Brett Smith almost 10 years ago
- Status changed from In Progress to Resolved
Applied in changeset arvados|commit:17014a715c21dd85a02c34b807b8c362c8706cf1.