Project

General

Profile

Actions

Bug #5365

closed

[Workbench] Do not link to objects that are not readable by current user

Added by Nancy Ouyang over 9 years ago. Updated over 9 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Radhika Chippada
Category:
Workbench
Target version:
Story points:
1.0

Description

Specific examples to fix

(copied from note-3)

In the specific case of the Log tab, when the log data is not readable by the current user (or anonymous user, in the anonymous case):
  • The Log tab should be greyed out / disabled / un-clickable
  • Hovering the Log tab should show a tooltip, something like "Log data is not available"
We should also avoid dead-linking other objects from pipeline/job pages that can be unreadable due to permissions (or deletions):
  • Job output collections
  • Job input collections (including docker image)
  • Template used to run this pipeline instance

Bug as reported

Fiddlesticks when viewing log on anonymous project pathomap

Also, clicking "report" on that fiddlesticks gives another fiddlesticks.

https://workbench.qr1hi.arvadosapi.com/jobs/qr1hi-8i9sb-34f5yfeggzdhn1n#Log

Oh... fiddlesticks.

An error occurred when Workbench sent a request to the Arvados API server. Try reloading this page. If the problem is temporary, your request might go through next time. If that doesn't work, the information below can help system administrators track down the problem.

API request URL
https://qr1hi.arvadosapi.com/arvados/v1/collections/b6b506fa082b773c61f8589b3261b3b0+87
API response {
":errors":[
"Path not found"
],
":error_token":"1425415698+1e617bf4"
}

If you suspect this is a bug, you can help us fix it by sending us a problem report:

Send a problem report right here. Report problem

If you prefer, send email to:

Missing template actions/report_issue_popup, application/report_issue_popup with {:locale=>[:en], :formats=>[:html], :variants=>[], :handlers=>[:erb, :builder, :raw, :ruby, :coffee]}. Searched in: * "/var/www/workbench.qr1hi.arvadosapi.com/releases/20150302143906/themes/curoverse/views" * "/var/www/workbench.qr1hi.arvadosapi.com/releases/20150302143906/app/views" * "/var/www/workbench.qr1hi.arvadosapi.com/shared/vendor_bundle/ruby/2.1.0/gems/wiselinks-1.2.1/app/views"


Subtasks 1 (0 open1 closed)

Task #5671: Review branch: 5365-not-link-unreadablesResolvedRadhika Chippada04/06/2015Actions
Actions #1

Updated by Nancy Ouyang over 9 years ago

curover.se/pathomap > SNP-Annotation-justmap > uuid > log

Actions #2

Updated by Radhika Chippada over 9 years ago

  • Subject changed from Fiddlesticks when viewing log on anonymous project pathomap to [Workbench] Fiddlesticks when viewing log on anonymous project pathomap
  • Category set to Workbench
  • Target version set to Bug Triage
Actions #3

Updated by Tom Clegg over 9 years ago

  • Story points set to 1.0

Generally, Workbench should not link to objects that are not readable.

In the specific case of the Log tab, when the log data is not readable by the current user (or anonymous user, in the anonymous case):
  • The Log tab should be greyed out / disabled / un-clickable
  • Hovering the Log tab should show a popup, something like "Log data is not available"
We should also avoid dead-linking other objects from pipeline/job pages that can be unreadable due to permissions (or deletions):
  • Job output collections
  • Job input collections (including docker image)
  • Template used to run this pipeline instance
Actions #4

Updated by Tom Clegg over 9 years ago

  • Target version changed from Bug Triage to 2015-04-01 sprint
Actions #5

Updated by Tom Clegg over 9 years ago

  • Subject changed from [Workbench] Fiddlesticks when viewing log on anonymous project pathomap to [Workbench] Do not link to objects that are not readable by current user
  • Description updated (diff)
Actions #6

Updated by Radhika Chippada over 9 years ago

  • Assigned To set to Radhika Chippada
Actions #7

Updated by Radhika Chippada over 9 years ago

  • Assigned To deleted (Radhika Chippada)
Actions #8

Updated by Brett Smith over 9 years ago

  • Assigned To set to Brett Smith
Actions #9

Updated by Tom Clegg over 9 years ago

  • Target version changed from 2015-04-01 sprint to 2015-04-29 sprint
Actions #10

Updated by Brett Smith over 9 years ago

  • Assigned To deleted (Brett Smith)
Actions #11

Updated by Radhika Chippada over 9 years ago

  • Assigned To set to Radhika Chippada
Actions #12

Updated by Radhika Chippada over 9 years ago

  • Status changed from New to In Progress
Actions #13

Updated by Radhika Chippada over 9 years ago

Notes about implementation:

- No longer dead link job and output when they are not readable by current user (or no user)

- Disable Log tab if not readable by anonymous user. (I felt disabling Log tab when current_user is present might be troublesome. The specific case I was concerned about was: a pipeline or job has started running just now. Log is not yet available. Hence disabling in this case seemed troublesome and hence I only disabled it for anonymous user case.)

- Template link was (already) hidden in pipeline instance page if it is not readable by user.

- Added integration tests and corresponding required test fixtures.

Actions #14

Updated by Brett Smith over 9 years ago

Reviewing 8ad6b6e

  • When a logged in user views a pipeline instance page, job UUIDs are not rendered.
  • Some pipelines can get quite long. The current implementation makes a separate API call to check the readability of every component job. I'm concerned that this could perform poorly for those large pipelines. Could it be possible to use preload_objects_for_dataclass, or a strategy similar to that method's implementation, to get the component jobs in a single API query?
  • Is it possible to write a helper method that does the six-line work of trying to find the object with a given UUID, yielding it to a block for rendering if found, and rendering a static string if not? I'm concerned that we will likely want to repeat this pattern as we continue development on anonymous browsing, and avoiding the repetition now would help.
  • The new test includes a condition if objects_readable or (!objects_readable and user). I think this would be simpler to read as if objects_readable or user (as you have it later in the test).
  • In the pipeline instance fixtures, I think it could save us potential future headache if the job hashes included at least all the attributes that are required on jobs: specifically script, script_version, script_parameters (with input set), and repository. We've run into issues in the past where our testing was less helpful than it should've been because fixtures did not look realistic. Spending a little time making them more realistic could help avoid headaches in the future.

Thank you.

Actions #15

Updated by Radhika Chippada over 9 years ago

Brett said:

  • When a logged in user views a pipeline instance page, job UUIDs are not rendered.

There were multiple issues at this line. Fixed.

  • Some pipelines can get quite long. The current implementation makes a separate API call to check the readability of every component job. I'm concerned that this could perform poorly for those large pipelines. Could it be possible to use preload_objects_for_dataclass, or a strategy similar to that method's implementation, to get the component jobs in a single API query?

Updated code to preload job uuids and output collection uuids. The existing preload methods do not work when preloading collections using portable data hashes. After spending some time to correct this (to add a new method that uses portable_data_hash instead of uuid in where clause), I now dropped this idea and doing a find on each of those. It appears that our api currently does not support group_by and without it all rows matching the pdh are being returned and hence I felt this might not be desirable. It also appears that most of newer pipelines will use uuids for output collections anyways (which are preloaded).

  • Is it possible to write a helper method that does the six-line work of trying to find the object with a given UUID, yielding it to a block for rendering if found, and rendering a static string if not? I'm concerned that we will likely want to repeat this pattern as we continue development on anonymous browsing, and avoiding the repetition now would help.

Updated this and used it for jobs and outputs.

  • The new test includes a condition if objects_readable or (!objects_readable and user). I think this would be simpler to read as if objects_readable or user (as you have it later in the test).

Thanks for catching this; corrected.

  • In the pipeline instance fixtures, I think it could save us potential future headache if the job hashes included at least all the attributes that are required on jobs: specifically script, script_version, script_parameters (with input set), and repository. We've run into issues in the past where our testing was less helpful than it should've been because fixtures did not look realistic. Spending a little time making them more realistic could help avoid headaches in the future.

Added these.

Radhika said:

  • Disable Log tab if not readable by anonymous user. (I felt disabling Log tab when current_user is present might be troublesome. The specific case I was concerned about was: a pipeline or job has started running just now. Log is not yet available. Hence disabling in this case seemed troublesome and hence I only disabled it for anonymous user case.)

I updated to disable Log tab when current user cannot write to the object (which would cover owner situation). With this, we will not be disabling the Log tab for a brand new pipeline with no log data as yet. And this also addresses the scenario where a logged in user is viewing a public pipeline with unreadable logs.

  • Just an observation about preloading data: When the user cannot read the objects, the preloading does not result in fetching them. Hence, we will be making one extra api call for each unreadable uuid unfortunately.
Actions #16

Updated by Brett Smith over 9 years ago

Radhika Chippada wrote:

  • Just an observation about preloading data: When the user cannot read the objects, the preloading does not result in fetching them. Hence, we will be making one extra api call for each unreadable uuid unfortunately.

I'm worried that this means the we might still witness the poor performance scenarios I mentioned in my last comment. While it's difficult to say right now what the "common" use case will be here, I think it will definitely happen where users sometimes make only minimal inputs and outputs available publicly, while keeping all other details like logs private. If they do that, this implementation can still end up making many API calls.

Would it make sense to extend preload_objects_for_dataclass to avoid these redundant API calls? For example, maybe it could expressly set @objects_for[uuid] = nil for any UUID that it doesn't find. That would prevent it from repeating the search in future calls. That has the potential to get expensive quickly.

If the solution isn't that simple for some reason, I still think we need some solution. Maybe it's a more involved rework of preload_objects_for_dataclass, or maybe it's a separate method. But as much as possible, I think it's worthwhile to avoid making a situation where we make an API call to check the readibility of an individual object.

  • Some pipelines can get quite long. The current implementation makes a separate API call to check the readability of every component job. I'm concerned that this could perform poorly for those large pipelines. Could it be possible to use preload_objects_for_dataclass, or a strategy similar to that method's implementation, to get the component jobs in a single API query?

Updated code to preload job uuids and output collection uuids. The existing preload methods do not work when preloading collections using portable data hashes. After spending some time to correct this (to add a new method that uses portable_data_hash instead of uuid in where clause), I now dropped this idea and doing a find on each of those. It appears that our api currently does not support group_by and without it all rows matching the pdh are being returned and hence I felt this might not be desirable.

It's possible to check the readability of portable data hashes in a list by selecting only the portable_data_hash column, and setting distinct to ask for unique values. Here's a demonstration using the Python SDK:

>>> cl = api.collections().list(
...     filters=[['portable_data_hash', 'in', ['d41d8cd98f00b204e9800998ecf8427e+0']]],
...     select=['portable_data_hash'],
...     distinct=True
... ).execute()
>>> pp(cl)
{u'etag': u'',
 u'items': [{u'kind': u'arvados#collection',
             u'portable_data_hash': u'd41d8cd98f00b204e9800998ecf8427e+0'}],
 u'items_available': 1507,
 u'kind': u'arvados#collectionList',
 u'limit': 100,
 u'offset': 0,
 u'self_link': u''}

Even though the limit is 100 and many more items are available, there's only one item in the response list. You can make this query on all the portable data hashes you're interested in. Any that are included in the response are readable.

  • When a logged in user views a pipeline instance page, job UUIDs are not rendered.

There were multiple issues at this line. Fixed.

  • Is it possible to write a helper method that does the six-line work of trying to find the object with a given UUID, yielding it to a block for rendering if found, and rendering a static string if not? I'm concerned that we will likely want to repeat this pattern as we continue development on anonymous browsing, and avoiding the repetition now would help.

Updated this and used it for jobs and outputs.

Throughout _running_component, could UUID links always be generated using link_to_arvados_object_if_readable? That method already checks the resource class just like we're doing for job UUIDs, so I wonder if we could reduce duplication by using the single method call for that. I think it could also be used in other places where we're using link_to_if_arvados_object, like owner and Docker image. (The Docker image link is specifically named in the description as something that must not be linked.)

The helper method includes a line if resource_class.andand.to_s == 'Collection'. Is the andand necessary here? Earlier code returns from the method if !resource_class, so I don't think resource_class will ever be nil at this point.

  • In the pipeline instance fixtures, I think it could save us potential future headache if the job hashes included at least all the attributes that are required on jobs: specifically script, script_version, script_parameters (with input set), and repository. We've run into issues in the past where our testing was less helpful than it should've been because fixtures did not look realistic. Spending a little time making them more realistic could help avoid headaches in the future.

Added these.

Thanks. Under these job hashes, the input parameter hashes should have value keys that point to the input collections used. crunch-dispatch normally makes that transformation. You can remove the title key if you like; that's not normally carried over.

Thank you.

Actions #17

Updated by Radhika Chippada over 9 years ago

Addressed the comments above. A few notes:

  • Workbench did not offer @distinct before, which made it not possible to get one collection each for a list of portable_data_hashes. I added this now and was able to add a new preload_for_pdhs method.
  • Updated pipeline_instances/_show_components_editable.html.erb to check if readable before linking input collections.
  • In the running components case, preloading job uuids, outputs, and docker image locators. Any others we want to preload?

Thanks.

Actions #18

Updated by Brett Smith over 9 years ago

Reviewing 02840ce9

The story says, "The Log tab should be greyed out / disabled / un-clickable" when the log is unviewable. Right now that's implemented by disabling the link's data-toggle attribute. However, this means the tab still appears clickable, and even gets focus if you click it; it just doesn't do anything. I think it might be a little nicer to put class="disabled" on the li element in this case. This actually grays out the tab, and changes the mouse cursor to show that it's not clickable. I tested this in Firebug and it seemed to work out all right.

Right now, link_to_arvados_object_if_readable takes a boolean friendly_name attribute. I wonder if it might be a little nicer and future-proof to have it take an opts={} argument, just like link_to_if_arvados_object does? Since the method sometimes falls back to calling link_to_if_arvados_object, I think it might be nice to give callers more control over the arguments that are passed in when it does—it seems pretty plausible that we'll want that in the future. It also makes the current implementation a little simpler, I think.

In _show_components_editable, tv[:dataclass].andand.eql?('Collection') can just say tv[:dataclass] == "Collection". It's safe to compare against nil with == and !=.

In the preload_for_pdhs test, I'm not sure what the assertions after # invoke again for this same input. this time, the preloaded data will be returned do. As far as I can tell, nothing in them checks that the data was actually returned from a cache (e.g., by using #equal?); it just makes the same basic "found the right result" assertions made after the first call. Am I missing something? If not, I think it would be good to update the assertions to better match the comment, or else remove them.

Thanks.

Actions #19

Updated by Radhika Chippada over 9 years ago

  • Status changed from In Progress to Resolved
  • % Done changed from 0 to 100

Applied in changeset arvados|commit:6564944612790934531a3c30e6ab2cab6f329461.

Actions #20

Updated by Radhika Chippada over 9 years ago

Thanks Brett. I incorporated all these suggestions and merged into master. I think the comment "link_to_arvados_object_if_readable takes args" was quite helpful. The method signature is now much better and is similar to other usages. It was a bit involved update to change all the usages, but is worth it. Thanks.

Actions

Also available in: Atom PDF