Project

General

Profile

Actions

Bug #4031

closed

[Workbench] Nodes should be connected via input=output in provenance/used by graphs

Added by Peter Amstutz over 9 years ago. Updated over 9 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
Workbench
Target version:
Story points:
1.0

Description

For example, the "Used by" tab on qr1hi-4zz18-xc7l2dlwet30do4 should have edges going out from the collection to the other two downstream collections.


Subtasks 3 (0 open3 closed)

Task #4333: Review 4031-fix-graph-connectionsResolvedPeter Amstutz10/28/2014Actions
Task #4143: Write testsResolvedPeter Amstutz10/28/2014Actions
Task #4142: Diagnose and fixResolvedPeter Amstutz10/28/2014Actions

Related issues

Has duplicate Arvados - Bug #4045: [Workbench] Provenance graph display regressed (possibly with #3036) and does not connect jobs/components via input/output pdhRejectedActions
Actions #1

Updated by Peter Amstutz over 9 years ago

  • Subject changed from Nodes not always connected in provenance/used by graphs to [Workbench] Nodes not connected in provenance/used by graphs
  • Description updated (diff)
  • Category set to Workbench
Actions #2

Updated by Tom Clegg over 9 years ago

  • Target version changed from Bug Triage to Arvados Future Sprints
Actions #3

Updated by Tom Clegg over 9 years ago

  • Subject changed from [Workbench] Nodes not connected in provenance/used by graphs to [Workbench] Nodes should be connected via input=output in provenance/used by graphs
  • Story points set to 0.5
Actions #4

Updated by Ward Vandewege over 9 years ago

  • Target version changed from Arvados Future Sprints to 2014-10-29 sprint
Actions #5

Updated by Peter Amstutz over 9 years ago

  • Assigned To set to Peter Amstutz
Actions #6

Updated by Tom Clegg over 9 years ago

  • Story points changed from 0.5 to 1.0
Actions #7

Updated by Peter Amstutz over 9 years ago

  • Status changed from New to In Progress
Actions #8

Updated by Peter Amstutz over 9 years ago

This one got away from me a little bit. The code had bit rotted pretty badly and was broken in a number of ways, so it needed a lot of work to get it back into shape and working reliably, making it more informative by using collection and component names where available, and making it (somewhat) testable.

Actions #9

Updated by Brett Smith over 9 years ago

Reviewing 4d257ed

  • I think the regexps that search for Collection UUIDs and portable data hashes should be anchored. That would better match the previous behavior (in CollectionsHelper.match), and avoids false positive matches.
  • I don't think it's safe to assume that any script named "run-command" is the run-command. Please check that the command parameter is an array before you try to join it.
  • The line value.andand.to_s.gsub("\"", "\\\"").gsub("\n", "\\n") will crash trying nil.gsub if value is nil. I think the best solution in this case is to take the andand out; nil.to_s # => "".
  • The line c = Collection.readable_by(*@read_users).where(portable_data_hash: loc.to_s).all is potentially very expensive. I think this would be a lot nicer as two queries: one with limit 2 and a name ordering, to see answer if c.size > 1 and if named.any?. If both are true, a second count query can report the total size.
  • The Workbench find_collection method is annotated "# returns hash, uuid," but this doesn't seem to be correct; the return value seems to vary based on what's passed in. Should that say "yields?"
  • It's difficult to follow the tests because a lot of hashes are hardcoded repeatedly, and all smooshed together in the test regexps. I think it would help readability if the reused hashes were loaded from fixtures, stashed in their own variables, and reused appropriately. That should make it easier to follow how the inputs and outputs related.
    Using assert_matches over assert /…/.match would provide nicer diagnostics, too.
  • Please remove the debug code in the jobs controller, and commented code in pipeline instance controller test and Workbench provenance helper.
  • I'm getting two Workbench test failures, one of them in your new tests:
  1) Failure:
CollectionsTest#test_Collection_portable_data_hash_with_multiple_matches [/home/brett/repos/arvados/apps/workbench/test/integration/collections_test.rb:198]:
Page /collections/ea10d51bcf88862dbcc36eb292017dfd+45 should contain link 'baz_file'

  2) Failure:
PipelineInstancesTest#test_view_pipeline_with_job_and_see_graph [/home/brett/repos/arvados/apps/workbench/test/integration/pipeline_instances_test.rb:137]:
Failed assertion, no message given.

Thanks.

Actions #10

Updated by Peter Amstutz over 9 years ago

  • Target version changed from 2014-10-29 sprint to 2014-11-19 sprint
Actions #11

Updated by Peter Amstutz over 9 years ago

Brett Smith wrote:

Reviewing 4d257ed

  • I think the regexps that search for Collection UUIDs and portable data hashes should be anchored. That would better match the previous behavior (in CollectionsHelper.match), and avoids false positive matches.

So the previous behavior was inadequate, it needs to be able to match identifiers within strings like "$(file xyz+123)". Not changing it unless you were thinking of something like a "word boundary" regex class (in which case: hmmmm.)

  • I don't think it's safe to assume that any script named "run-command" is the run-command. Please check that the command parameter is an array before you try to join it.

Fixed.

  • The line value.andand.to_s.gsub("\"", "\\\"").gsub("\n", "\\n") will crash trying nil.gsub if value is nil. I think the best solution in this case is to take the andand out; nil.to_s # => "".

Fixed.

  • The line c = Collection.readable_by(*@read_users).where(portable_data_hash: loc.to_s).all is potentially very expensive. I think this would be a lot nicer as two queries: one with limit 2 and a name ordering, to see answer if c.size > 1 and if named.any?. If both are true, a second count query can report the total size.

So one database call becomes three, but this is probably a great example of trading off slightly worse average runtime to avoid dramatically bad runtime in cases with a huge number of matches.

  • The Workbench find_collection method is annotated "# returns hash, uuid," but this doesn't seem to be correct; the return value seems to vary based on what's passed in. Should that say "yields?"

You're right. Fixed.

  • It's difficult to follow the tests because a lot of hashes are hardcoded repeatedly, and all smooshed together in the test regexps. I think it would help readability if the reused hashes were loaded from fixtures, stashed in their own variables, and reused appropriately. That should make it easier to follow how the inputs and outputs related.
    Using assert_matches over assert /…/.match would provide nicer diagnostics, too.

It took the better part of the day to refactor the tests (uhg), but they are much better for it. Thanks.

  • Please remove the debug code in the jobs controller, and commented code in pipeline instance controller test and Workbench provenance helper.

Fixed.

  • I'm getting two Workbench test failures, one of them in your new tests:

Fixed.

Actions #12

Updated by Brett Smith over 9 years ago

Reviewing b798b32

  • The new test fixtures upset other Workbench tests:
      1) Error:
    PipelineInstancesTest#test_Create_and_run_a_pipeline:
    Capybara::Ambiguous: Ambiguous match, found 2 elements matching css "tr" with text "GNU_General_Public_License" 
        test/integration/pipeline_instances_test.rb:34:in `block in <class:PipelineInstancesTest>'
    
      2) Error:
    PipelineInstancesTest#test_Create_pipeline_inside_a_project_and_run:
    Capybara::Ambiguous: Ambiguous match, found 2 elements matching css "tr" with text "GNU_General_Public_License" 
        test/integration/pipeline_instances_test.rb:111:in `block in <class:PipelineInstancesTest>'
    
  • Please define RequestDuck somewhere it can be done once and shared—maybe test_helper.

Thanks.

Actions #13

Updated by Peter Amstutz over 9 years ago

Brett Smith wrote:

Reviewing b798b32

  • The new test fixtures upset other Workbench tests:
    [...]

Fixed.

  • Please define RequestDuck somewhere it can be done once and shared—maybe test_helper.

Done.

Actions #14

Updated by Anonymous over 9 years ago

  • Status changed from In Progress to Resolved
  • % Done changed from 67 to 100

Applied in changeset arvados|commit:d65b1921a9aeffd6906b95aa927a07a48f013b32.

Actions

Also available in: Atom PDF