Story #19676


Turn data organization patterns deck into documentation page and/or blog post

Added by Peter Amstutz about 1 month ago. Updated 9 days ago.

In Progress
Assigned To:
Target version:
Start date:
Due date:
% Done:


Estimated time:
(Total: 0.00 h)
Story points:

Subtasks 1 (1 open0 closed)

Task #19679: Review "data-mgtmt" branch in arvados-wwwIn ProgressPeter Amstutz10/28/2022

Actions #1

Updated by Peter Amstutz about 1 month ago

  • Assigned To set to Peter Amstutz
Actions #2

Updated by Peter Amstutz about 1 month ago

  • Description updated (diff)
Actions #3

Updated by Peter Amstutz about 1 month ago

  • Status changed from New to In Progress
Actions #4

Updated by Sarah Zaranek 29 days ago

  • Global Comments:
    • Looks like the template is off. The other blogs has green headers and your has black.
    • There is no conclusion, it just kinda ends
    • The headings aren't capitalized as titles should be
    • Would be nice to include hyperlinks -- will try to point out specific points for these as I move along the sections
    • You capitalize Collections but not Projects but I am not sure why - I think it is consistent throughout the blog though.
  • Introduction
    • The Arvados User/Group/Project system --> it might be nice to call this something different. I will try to think of something but just noting here
    • I think it might be nice to introduce what Users and Projects and Collections are here , you try to define below but it is kinda nebulous about what they are or what they do. Here is a shot.
    • Arvados data management centers around Users, Projects and Collections. Users, simply put, is a individual with access to the Arvados cluster. Projects in Arvados help you organize and track your work – and can contain data, workflow code, details about workflow runs, and results. Collections in Arvados help organize and manage your data. In a simple hierarchy - Projects can hold Collections, Collections hold Data Files.
  • Terminology
    • This isn't really terminology because you are just explaining how permissions work for different things not really explaining what they are for most terms. Perhaps we call this section: "The Arvados Permission Model"
    • There isn't much of an introduction here. That might be because it was meant as a Terminology section but I would suggest having one to put things in context
    • Maybe you can start this section off with something like -- " In this section, I will discuss the basics of permissions and access modes in Arvados. Objects in Arvados such as Projects and Collections can have the following attributes:
    • Users are granted permissions on projects and groups. Users are represented by a face. --> Users are granted access permissions to data through permissions set on Projects and Groups. In the diagram below, Users are represented by faces.
    • Projects are a unit of both data organization and permission grants. ---> I find this statement a little confusing. I think it is just because I don't think about permissioning as a unit of anything. Perhaps re-word this for newbies? Projects can be both be used to organize data as well to grant access permissions. Projects manage access permissions for objects they "own". Access to a project contents can be granted to Users or Groups.
    • Also need to add the part for what represents Projects in diagrams - so "Projects are represented in the following diagrams by ovals.
    • Groups are used for permission organization. Granting a permission to a Group grants the permission to everyone in the group. Groups in the diagrams below are represented by rectangles.
    • In the API, there is a single “group” endpoint which represents several “classes” of groups: --> In the Arvados API, there are different classes of Groups:
Actions #5

Updated by Sarah Zaranek 29 days ago

  • Transitive permissions
    • What does "greatest" or "least" mean in terms of permissions - most permissive, least permissive?
    • The effective permission a user has on some object is the greatest permission among all paths between the user and the object. The permission for a given path is the least permission on the path between the user and the target. --> This look me a couple times to read to figure out what you were saying. Do you have to mention all paths yet since you are only dealing with a single path. Could you just -- The permission for a given path is the least permission on the path between the user and the target.
    • It might be worth adding a muti-path to show it is greatest permission amongst all paths - because that seems a bit of a throwaway point.
    • Quality metadata ensures that data can be found the those who are looking for it, and then you have a list of metadata. Perhaps you add a phrase so this flows better like "Some types of metadata include:" or something like that?
    • I assume native metadata is data that Arvados automatically creates for you but might be nice to spell that out?
    • Version history of collections --> Maybe explain this better or phrase it differently - it seems different than than the other two. Not sure how to fix it, it just reads off to me. so, take that as you will :)
    • This seems to pop up out the blue especially because it is after metadata because it seems like the demos are done because you switched topics to talk about metadata -- Perhaps but metadata last unless you need to mention it first.
    • Would be good just to have an introductory sentence or flesh out the bullet points. The previous example just looked at permissions for a single User. Let's look at a more involved case where we are deriving permissions for multiple users for multiple projects.
    • I will admit I am kinda confused what I am getting out this section. It seems like it is one of the options you discussed above so it would be nice to actually explain why this is here and how it relates to the rest of the blog. In this example, we are organizing by User and Project - which is the case that ----"
    • I have the same concerns for the rest of the diagrams below. We are mixing showing me how things work and different structures of data management which each can be used and have pluses and minuses, but there is no context. At least a sentence to introduce sections to Process automatic would be good.
    • I am going to stop here because it really stops being readable to me at this point. I have no context and it is just the slides. I would recommend adding descriptions and context and then maybe I can finishing reading?
Actions #6

Updated by Peter Amstutz 23 days ago

  • Target version changed from 2022-11-09 sprint to 2022-12-07 Sprint
Actions #7

Updated by Peter Amstutz 14 days ago

  • Target version changed from 2022-12-07 Sprint to 2022-12-21 Sprint
Actions #8

Updated by Peter Amstutz 9 days ago

  • Target version changed from 2022-12-21 Sprint to 2023-01-18 sprint
Actions #9

Updated by Peter Amstutz 9 days ago

  • Target version changed from 2023-01-18 sprint to 2023-02-01 sprint
Actions #10

Updated by Peter Amstutz 9 days ago

  • Category set to Documentation

Also available in: Atom PDF