Bug #7001

[SDKs] Recursive pipeline instance arv-copy doesn't copy job objects, losing metadata

Added by Sarah Guthrie almost 4 years ago. Updated over 3 years ago.

Status:
New
Priority:
Normal
Assigned To:
-
Category:
SDKs
Target version:
Start date:
09/15/2015
Due date:
% Done:

0%

Estimated time:
(Total: 0.00 h)
Story points:
-

Description

The instance in question is https://workbench.su92l.arvadosapi.com/pipeline_instances/su92l-d1hrv-38ewfna4842vrnr, which was copied from https://workbench.tb05z.arvadosapi.com/pipeline_instances/tb05z-d1hrv-ytlcag81c6or2i7

I would guess these problems are caused by the fact that the jobs are not copied with the pipeline instance.

This bug is marked with higher priority since the fact that it exists will be publicized after submission: it's in the public project associated with the Lightning paper

The problems include:
  • Provenance graphs on pipeline instance produces fiddlesticks (PipelineInstanceGraph.png)
  • Source information on collections produced by the pipeline instance is missing (ProvenanceGraphEmpty.png, UsedByGraphEmpty.png)
  • Provenance graph on collections produced by the pipeline instance is missing (ProvenanceGraphEmpty.png)
  • Used by graph on collections produced by the pipeline instance is missing (UsedByGraphEmpty.png)
  • (Related to #6095), the order of pipeline components in the copied pipeline instance is incorrect, and editing the instance is not enough to change this order
PipelineInstanceGraph.png (84.6 KB) PipelineInstanceGraph.png Sarah Guthrie, 08/17/2015 06:46 PM
ProvenanceGraphEmpty.png (130 KB) ProvenanceGraphEmpty.png Sarah Guthrie, 08/17/2015 06:46 PM
UsedByGraphEmpty.png (130 KB) UsedByGraphEmpty.png Sarah Guthrie, 08/17/2015 06:46 PM

Subtasks

Task #7339: Update the "Using arv-copy" user guide to include pipeline instances as one of the object types that can be copied over.New


Related issues

Related to Arvados - Bug #7008: [Workbench] Crashes with Fiddlesticks when trying to view a pipeline instance graph copied from another clusterNew08/17/2015

Associated revisions

Revision 45fcd8ae (diff)
Added by Brett Smith almost 4 years ago

6095: Stop demoting OrderedDicts to dicts in arv-copy.

History: first there was 79564b0ac7d03327cc351bbd6df544ab1f776380.
This preserved the order of copied pipeline templates, but that's in
part because it stopped recursing through those templates.
1b8caff3ad598744e4a0379b01fc95ca4838caa0 fixed the recursion, but then
started losing the order again. This retains the order by ensuring we
copy OrderedDicts as OrderedDicts.

Refs #6095, #7001.

History

#1 Updated by Brett Smith almost 4 years ago

  • Subject changed from [arv-copy] arv-copy recursive fails to copy pipeline instance completely to [SDKs] arv-copy recursive fails to copy pipeline instance completely
  • Category set to SDKs

Sarah Guthrie wrote:

I would guess these problems are caused by the fact that the jobs are not copied with the pipeline instance.

You're right, although there are some sub-bugs we can identify and tackle individually.

The problems include:
  • Provenance graphs on pipeline instance produces fiddlesticks (PipelineInstanceGraph.png)

Fixing the fiddlesticks is #7008.

  • Source information on collections produced by the pipeline instance is missing (ProvenanceGraphEmpty.png, UsedByGraphEmpty.png)
  • Provenance graph on collections produced by the pipeline instance is missing (ProvenanceGraphEmpty.png)
  • Used by graph on collections produced by the pipeline instance is missing (UsedByGraphEmpty.png)

Yeah, these all have the same basic root cause: the pipeline instance says they came from a particular job, but that refers to a job on another cluster, which Workbench doesn't know how to introspect.

  • (Related to #6095), the order of pipeline components in the copied pipeline instance is incorrect, and editing the instance is not enough to change this order

Just pushed a fix.

#2 Updated by Sarah Guthrie almost 4 years ago

I would also like to point out that I will predict the empty source information, provenance graphs, used-by graphs, and even pipeline instance graphs will also appear if one arv-copies a pipeline instance between projects with different permissions. Preventing this would be nice.

#3 Updated by Brett Smith almost 4 years ago

Sarah Guthrie wrote:

I would also like to point out that I will predict the empty source information, provenance graphs, used-by graphs, and even pipeline instance graphs will also appear if one arv-copies a pipeline instance between projects with different permissions. Preventing this would be nice.

Yeah, I'm pretty sure you're right. I agree that it's worth fixing the arv-copy bug at its source too, and that's what I want this ticket to represent. But figuring out the mechanics of copying jobs is going to be a little more involved.

#4 Updated by Brett Smith almost 4 years ago

  • Target version set to Arvados Future Sprints

#5 Updated by Brett Smith almost 4 years ago

  • Subject changed from [SDKs] arv-copy recursive fails to copy pipeline instance completely to [SDKs] Recursive pipeline instance arv-copy doesn't copy job objects, losing metadata

#6 Updated by Tom Clegg over 3 years ago

Note all of the relevant information is copied with the pipeline instance: you can see all the job details right there on the pipeline instance page. Looking up the PDH given in the "log" attribute there also works. I see this as a rendering problem. Although Workbench's pipeline instance page knows how to display jobs that come from the pipeline instance record, various other parts of Workbench aren't so smart.
  • The "log" links point to the log tab of the "show job" page for the job, even though the job UUID doesn't exist here. Perhaps this can be fixed by making a route to the "show job" page like "component X from pipeline instance Y" (this would also make the job page more context-aware, e.g., it could show the "component name" of the job in the relevant pipeline, and it could show the pipeline in breadcrumbs).
  • The graph tab (I'm guessing) tries to look up the latest job records, skips each job because that doesn't work, and violates some assumption that there will be at least one component to display. If we make it fall back on the pipeline instance record instead, it should have everything needed to make a graph.

#7 Updated by Brett Smith over 3 years ago

Tom Clegg wrote:

  • The graph tab (I'm guessing) tries to look up the latest job records, skips each job because that doesn't work, and violates some assumption that there will be at least one component to display. If we make it fall back on the pipeline instance record instead, it should have everything needed to make a graph.

The basic crash is #7008. We should consider extending the fix there like you describe.

But I'm not sure that fix would address all our graph problems? For example, I don't think that suffices to render provenance graphs for collections that come out of the pipeline instance. To do that, we need to be able to figure out which jobs the collection came from, and we can't query pipeline instance components this way.

#8 Updated by Tom Clegg over 3 years ago

Brett Smith wrote:

But I'm not sure that fix would address all our graph problems? For example, I don't think that suffices to render provenance graphs for collections that come out of the pipeline instance. To do that, we need to be able to figure out which jobs the collection came from, and we can't query pipeline instance components this way.

It's true, this wouldn't be enough to get foreign jobs to appear in the "which jobs lead to this collection?" graphs even if pipelines have been copied from the foreign cluster -- but this part sounds as much like a feature as a bug.

It also doesn't help resolve the discrepancy between job and pipeline sharing when those items are in different projects (e.g., you can see the provenance graph on the pipeline page, but if you ask "which jobs lead to {pipeline output}" you get nothing). This part doesn't sound much like a feature.

One way to look at this is that we need weaker job/output claims (weaker than "this arvados system ran this job as described here") to be visible to users, without conflating those weak claims with the strong claims in the jobs table. The distinction between a real job and an arbitrary user-created job record (from arv-copy or running crunch-job in local/dev mode) is rather hard for a user to see, might not even be taken into account when reusing jobs, etc. One possibility is to fix this (with, essentially, a "real" flag) such that we can copy jobs all over the place without worrying about them getting inadvertently re-queued, edited by hand and then erroneously used by automatic job reuse, etc.

#9 Updated by Brett Smith over 3 years ago

Tom Clegg wrote:

Brett Smith wrote:

But I'm not sure that fix would address all our graph problems? For example, I don't think that suffices to render provenance graphs for collections that come out of the pipeline instance. To do that, we need to be able to figure out which jobs the collection came from, and we can't query pipeline instance components this way.

It's true, this wouldn't be enough to get foreign jobs to appear in the "which jobs lead to this collection?" graphs even if pipelines have been copied from the foreign cluster -- but this part sounds as much like a feature as a bug.

I get what you're saying that the cluster doesn't have much assurance that the provenance actually is what's recorded in the pipeline instance, but having disagreeing provenance graphs doesn't seem like the most helpful way to communicate that. I think ideally the graph would be rendered differently, and maybe include a footnote explaining that the results may not be reflected across the cluster.

It also doesn't help resolve the discrepancy between job and pipeline sharing when those items are in different projects (e.g., you can see the provenance graph on the pipeline page, but if you ask "which jobs lead to {pipeline output}" you get nothing). This part doesn't sound much like a feature.

FWIW, I think that's an equally important application, and maybe more pressing to fix since it's easier. It's become a common pipeline development idiom to work on a pipeline in a project, then after you get a clean, successful run, copy that instance to a new project to share and show off. In order for that to be effective, all the associated metadata needs to come along to the new project.

Also available in: Atom PDF