Support #16926

CWL viewer

Added by Peter Amstutz about 1 year ago. Updated 9 months ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
-
Target version:
Start date:
10/01/2020
Due date:
% Done:

100%

Estimated time:
(Total: 0.00 h)
Story points:
-

Description

  • Set up account owned by SFC
  • Schedule time with Stian
  • Set up VM
  • Migrate DNS
  • Set up actual service
  • Migrate database

https://github.com/common-workflow-language/cwlviewer

Meeting notes: https://docs.google.com/document/d/1jnUJM5z-we_CNMogUkcJesMEhoFkUrf4tqs4FK8tqrI/edit#heading=h.f2av3ipqwwrp


Subtasks

Task #16934: Cloud account owned by CWL governing board / SFCResolvedWard Vandewege

Task #16935: Set up VM in new accountResolved

Task #16936: Set up viewer serviceResolved


Related issues

Related to Arvados Epics - Story #16011: CWL support, docs, training, websiteIn Progress07/01/202012/31/2021

Related to Arvados - Feature #17505: cwlviewer updateResolved

History

#1 Updated by Peter Amstutz about 1 year ago

  • Related to Story #16011: CWL support, docs, training, website added

#2 Updated by Peter Amstutz about 1 year ago

  • Description updated (diff)

#3 Updated by Peter Amstutz about 1 year ago

  • Target version set to 2020-10-07 Sprint

#4 Updated by Peter Amstutz about 1 year ago

  • Target version changed from 2020-10-07 Sprint to 2020-10-21 Sprint

#5 Updated by Peter Amstutz about 1 year ago

  • Tracker changed from Story to Support

#6 Updated by Peter Amstutz about 1 year ago

  • Assigned To set to Ward Vandewege

#7 Updated by Peter Amstutz 12 months ago

  • Target version changed from 2020-10-21 Sprint to 2020-11-04 Sprint

#8 Updated by Peter Amstutz 12 months ago

  • Target version changed from 2020-11-04 Sprint to 2020-11-18

#9 Updated by Peter Amstutz 11 months ago

  • Target version changed from 2020-11-18 to 2020-12-02 Sprint

#10 Updated by Peter Amstutz 11 months ago

  • Target version changed from 2020-12-02 Sprint to 2020-12-16 Sprint

#11 Updated by Ward Vandewege 10 months ago

  • Status changed from New to In Progress

I created a new AWS account in our organization. At any time we can detach the account from our org, should that become necessary in the future.

#12 Updated by Peter Amstutz 10 months ago

  • Description updated (diff)

#13 Updated by Ward Vandewege 10 months ago

I've rolled out the VM with terraform and used Salt to configure it.

The hostname is https://cwlviewer.arvados.org

Next step: get the data from view.commonwl.org imported.

Then: get DNS record updated, and then get an SSL cert for view.commonwl.org via Salt.

#14 Updated by Peter Amstutz 10 months ago

  • Target version changed from 2020-12-16 Sprint to 2021-01-06 Sprint

#15 Updated by Ward Vandewege 9 months ago

The import command:

ward@cwlviewer:/usr/src/cwlviewer$ time ./load.py /var/backups/cwl/2020-12-15T161841+0000.json.gz https://cwlviewer.arvados.org/

I had to restart it a few times; there is a diskspace leak that causes the disk to fill up during import (a file descriptor that is not released properly?). Restarting docker clears up the excessive space usage.

It seems that the importer doesn't resume cleanly; I have 6 queue entries that are stuck, which effectively halted the import:

The restore processed 1152 workflows, but there are a bunch more on view.commonwl.org. It is currently stuck on 6 workflows in the queue that are supposedly "running". I suspect these are the ones that were in process when the disk filled up (due to that file descriptor/deletion bug; restarting composer cleared up the disk space), since the queue IDs are at least somewhat sequential:

Still running https://cwlviewer.arvados.org/queue/5fd8edda08813b0001ca6bc3
Still running https://cwlviewer.arvados.org/queue/5fd94ac82ab79c00011833fc
Still running https://cwlviewer.arvados.org/queue/5fd94b342ab79c0001183400
Still running https://cwlviewer.arvados.org/queue/5fd94b342ab79c0001183401
Still running https://cwlviewer.arvados.org/queue/5fd94b352ab79c0001183402
Still running https://cwlviewer.arvados.org/queue/5fd94b352ab79c0001183403

Those correspond to these URLs:

https://github.com/mskcc/roslin-variant.git
https://github.com/genome/analysis-workflows.git
https://github.com/genome/analysis-workflows.git
https://github.com/ncbi/pgap.git
https://github.com/Duke-GCB/bespin-cwl.git
https://github.com/rosafilgueira/cyclon_usecase.git

which don't seem out of the ordinary.

Is there a way to cancel a queue entry? I didn't see a mention of that on the api page.

Eventually, it gave up:

Trimmed queue from 6 to 6
Still running https://cwlviewer.arvados.org/queue/5fd8edda08813b0001ca6bc3
Still running https://cwlviewer.arvados.org/queue/5fd94ac82ab79c00011833fc
Still running https://cwlviewer.arvados.org/queue/5fd94b342ab79c0001183400
Still running https://cwlviewer.arvados.org/queue/5fd94b342ab79c0001183401
Still running https://cwlviewer.arvados.org/queue/5fd94b352ab79c0001183402
Still running https://cwlviewer.arvados.org/queue/5fd94b352ab79c0001183403
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/urllib3/connection.py", line 159, in _new_conn
    (self._dns_host, self.port), self.timeout, **extra_kw)
  File "/usr/lib/python3/dist-packages/urllib3/util/connection.py", line 80, in create_connection
    raise err
  File "/usr/lib/python3/dist-packages/urllib3/util/connection.py", line 70, in create_connection
    sock.connect(sa)
TimeoutError: [Errno 110] Connection timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 600, in urlopen
    chunked=chunked)
  File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 343, in _make_request
    self._validate_conn(conn)
  File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 841, in _validate_conn
    conn.connect()
  File "/usr/lib/python3/dist-packages/urllib3/connection.py", line 301, in connect
    conn = self._new_conn()
  File "/usr/lib/python3/dist-packages/urllib3/connection.py", line 168, in _new_conn
    self, "Failed to establish a new connection: %s" % e)
urllib3.exceptions.NewConnectionError: <urllib3.connection.VerifiedHTTPSConnection object at 0x7f5d81bea978>: Failed to establish a new connection: [Errno 110] Connection timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/requests/adapters.py", line 449, in send
    timeout=timeout
  File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 638, in urlopen
    _stacktrace=sys.exc_info()[2])
  File "/usr/lib/python3/dist-packages/urllib3/util/retry.py", line 398, in increment
    raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='cwlviewer.arvados.org', port=443): Max retries exceeded with url: /queue/5fd94b352ab79c0001183402 (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f5d81bea978>: Failed to establish a new connection: [Errno 110] Connection timed out'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "./load.py", line 117, in <module>
    main(*sys.argv[1:])
  File "./load.py", line 107, in main
    queued = trim_queue(queued)
  File "./load.py", line 82, in trim_queue
    if is_running(q):
  File "./load.py", line 63, in is_running
    queued = requests.get(location, allow_redirects=False, headers=HEADERS)
  File "/usr/lib/python3/dist-packages/requests/api.py", line 75, in get
    return request('get', url, params=params, **kwargs)
  File "/usr/lib/python3/dist-packages/requests/api.py", line 60, in request
    return session.request(method=method, url=url, **kwargs)
  File "/usr/lib/python3/dist-packages/requests/sessions.py", line 533, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/lib/python3/dist-packages/requests/sessions.py", line 646, in send
    r = adapter.send(request, **kwargs)
  File "/usr/lib/python3/dist-packages/requests/adapters.py", line 516, in send
    raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='cwlviewer.arvados.org', port=443): Max retries exceeded with url: /queue/5fd94b352ab79c0001183402 (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f5d81bea978>: Failed to establish a new connection: [Errno 110] Connection timed out'))

real    7855m45.990s
user    977m47.651s
sys     36m38.701s

#16 Updated by Ward Vandewege 9 months ago

I connected to the mongo db:

> show dbs
admin  0.000GB
local  0.000GB
test   0.006GB
> use test
switched to db test
> show collections;
queuedWorkflow
workflow
> db.queuedWorkflow.find( { cwltoolStatus: "RUNNING" } ).count()
6
> db.queuedWorkflow.find( { cwltoolStatus: "RUNNING" } )
{ "_id" : ObjectId("5fd8edda08813b0001ca6bc3"), "_class" : "org.commonwl.view.workflow.QueuedWorkflow", "tempRepresentation" : { "_id" : null, "retrievedFrom" : { "repoUrl" : "https://github.com/mskcc/roslin-variant.git", "branch" : "2.4.x", "path" : "setup/cwl/module-5.cwl" }, "retrievedOn" : ISODate("2020-12-15T17:09:46.260Z"), "lastCommit" : "425970cb18efb69b42edf26f92c07b7683a62576", "label" : "module-5", "inputs" : { "md_metrics_files" : { "type" : "[]", "sourceID" : [ ] }, "fp_intervals" : { "type" : "File", "sourceID" : [ ] }, "bams" : { "type" : "File[]", "sourceID" : [ ] }, "request_file" : { "type" : "File", "sourceID" : [ ] }, "target_intervals" : { "type" : "File", "sourceID" : [ ] }, "fp_genotypes" : { "type" : "File", "sourceID" : [ ] }, "grouping_file" : { "type" : "File", "sourceID" : [ ] }, "project_prefix" : { "type" : "string", "sourceID" : [ ] }, "genome" : { "type" : "string", "sourceID" : [ ] }, "bait_intervals" : { "type" : "File", "sourceID" : [ ] }, "clstats1" : { "type" : "[]", "sourceID" : [ ] }, "pairing_file" : { "type" : "File", "sourceID" : [ ] }, "clstats2" : { "type" : "[]", "sourceID" : [ ] } }, "outputs" : { "qual_pdf" : { "type" : "File[]", "sourceID" : [ "scatter_metrics" ] }, "per_target_coverage" : { "type" : "File[]", "sourceID" : [ "scatter_metrics" ] }, "doc_basecounts" : { "type" : "File[]", "sourceID" : [ "scatter_metrics" ] }, "gcbias_summary" : { "type" : "File[]", "sourceID" : [ "scatter_metrics" ] }, "qc_files" : { "type" : "File[]", "sourceID" : [ "generate_pdf" ] }, "hs_metrics" : { "type" : "File[]", "sourceID" : [ "scatter_metrics" ] }, "gcbias_pdf" : { "type" : "File[]", "sourceID" : [ "scatter_metrics" ] }, "insert_metrics" : { "type" : "File[]", "sourceID" : [ "scatter_metrics" ] }, "insert_pdf" : { "type" : "File[]", "sourceID" : [ "scatter_metrics" ] }, "qual_metrics" : { "type" : "File[]", "sourceID" : [ "scatter_metrics" ] }, "gcbias_metrics" : { "type" : "File[]", "sourceID" : [ "scatter_metrics" ] }, "as_metrics" : { "type" : "File[]", "sourceID" : [ "scatter_metrics" ] } }, "steps" : { "scatter_metrics" : { "run" : "", "sources" : { "genome" : { "sourceID" : [ "genome" ] }, "bait_intervals" : { "sourceID" : [ "bait_intervals" ] }, "fp_intervals" : { "sourceID" : [ "fp_intervals" ] }, "target_intervals" : { "sourceID" : [ "target_intervals" ] }, "bam" : { "sourceID" : [ "bams" ] } } }, "generate_pdf" : { "run" : "cmo-qcpdf/0.5.11/cmo-qcpdf.cwl", "sources" : { "qualmetrics_files" : { "sourceID" : [ ] }, "md_metrics_files" : { "sourceID" : [ "md_metrics_files" ] }, "mdmetrics_files" : { "sourceID" : [ ] }, "hsmetrics_files" : { "sourceID" : [ ] }, "request_file" : { "sourceID" : [ "request_file" ] }, "fp_genotypes" : { "sourceID" : [ "fp_genotypes" ] }, "grouping_file" : { "sourceID" : [ "grouping_file" ] }, "file_prefix" : { "sourceID" : [ "project_prefix" ] }, "gcbias_files" : { "sourceID" : [ ] }, "trimgalore_files" : { "sourceID" : [ ] }, "clstats1" : { "sourceID" : [ "clstats1" ] }, "insertsize_files" : { "sourceID" : [ ] }, "files" : { "sourceID" : [ "scatter_metrics" ] }, "fingerprint_files" : { "sourceID" : [ ] }, "pairing_file" : { "sourceID" : [ "pairing_file" ] }, "clstats2" : { "sourceID" : [ "clstats2" ] } } } }, "cwltoolVersion" : "1.0.20180525185854", "visualisationDot" : "digraph workflow {\n  graph [\n    bgcolor = \"#eeeeee\"\n    color = \"black\"\n    fontsize = \"10\"\n    labeljust = \"left\"\n    clusterrank = \"local\"\n    ranksep = \"0.22\"\n    nodesep = \"0.05\"\n  ]\n  node [\n    fontname = \"Helvetica\"\n    fontsize = \"10\"\n    fontcolor = \"black\"\n    shape = \"record\"\n    height = \"0\"\n    width = \"0\"\n    color = \"black\"\n    fillcolor = \"lightgoldenrodyellow\"\n    style = \"filled\"\n  ];\n  edge [\n    fontname=\"Helvetica\"\n    fontsize=\"8\"\n    fontcolor=\"black\"\n    color=\"black\"\n    arrowsize=\"0.7\"\n  ];\n  subgraph cluster_inputs {\n    rank = \"same\";\n    style = \"dashed\";\n    label = \"Workflow Inputs\";\n    \"md_metrics_files\" [fillcolor=\"#94DDF4\"];\n    \"fp_intervals\" [fillcolor=\"#94DDF4\"];\n    \"bams\" [fillcolor=\"#94DDF4\"];\n    \"request_file\" [fillcolor=\"#94DDF4\"];\n    \"target_intervals\" [fillcolor=\"#94DDF4\"];\n    \"fp_genotypes\" [fillcolor=\"#94DDF4\"];\n    \"grouping_file\" [fillcolor=\"#94DDF4\"];\n    \"project_prefix\" [fillcolor=\"#94DDF4\"];\n    \"genome\" [fillcolor=\"#94DDF4\"];\n    \"bait_intervals\" [fillcolor=\"#94DDF4\"];\n    \"clstats1\" [fillcolor=\"#94DDF4\"];\n    \"pairing_file\" [fillcolor=\"#94DDF4\"];\n    \"clstats2\" [fillcolor=\"#94DDF4\"];\n  }\n  subgraph cluster_outputs {\n    rank = \"same\";\n    style = \"dashed\";\n    label = \"Workflow Outputs\";\n    \"qual_pdf\" [fillcolor=\"#94DDF4\"];\n    \"per_target_coverage\" [fillcolor=\"#94DDF4\"];\n    \"doc_basecounts\" [fillcolor=\"#94DDF4\"];\n    \"gcbias_summary\" [fillcolor=\"#94DDF4\"];\n    \"qc_files\" [fillcolor=\"#94DDF4\"];\n    \"hs_metrics\" [fillcolor=\"#94DDF4\"];\n    \"gcbias_pdf\" [fillcolor=\"#94DDF4\"];\n    \"insert_metrics\" [fillcolor=\"#94DDF4\"];\n    \"insert_pdf\" [fillcolor=\"#94DDF4\"];\n    \"qual_metrics\" [fillcolor=\"#94DDF4\"];\n    \"gcbias_metrics\" [fillcolor=\"#94DDF4\"];\n    \"as_metrics\" [fillcolor=\"#94DDF4\"];\n  }\n  \"scatter_metrics\";\n  \"generate_pdf\";\n  \"scatter_metrics\" -> \"qual_pdf\";\n  \"scatter_metrics\" -> \"per_target_coverage\";\n  \"scatter_metrics\" -> \"doc_basecounts\";\n  \"scatter_metrics\" -> \"gcbias_summary\";\n  \"generate_pdf\" -> \"qc_files\";\n  \"scatter_metrics\" -> \"hs_metrics\";\n  \"scatter_metrics\" -> \"gcbias_pdf\";\n  \"scatter_metrics\" -> \"insert_metrics\";\n  \"scatter_metrics\" -> \"insert_pdf\";\n  \"scatter_metrics\" -> \"qual_metrics\";\n  \"scatter_metrics\" -> \"gcbias_metrics\";\n  \"scatter_metrics\" -> \"as_metrics\";\n  \"genome\" -> \"scatter_metrics\";\n  \"bait_intervals\" -> \"scatter_metrics\";\n  \"fp_intervals\" -> \"scatter_metrics\";\n  \"target_intervals\" -> \"scatter_metrics\";\n  \"bams\" -> \"scatter_metrics\";\n  \"md_metrics_files\" -> \"generate_pdf\";\n  \"request_file\" -> \"generate_pdf\";\n  \"fp_genotypes\" -> \"generate_pdf\";\n  \"grouping_file\" -> \"generate_pdf\";\n  \"project_prefix\" -> \"generate_pdf\";\n  \"clstats1\" -> \"generate_pdf\";\n  \"scatter_metrics\" -> \"generate_pdf\";\n  \"pairing_file\" -> \"generate_pdf\";\n  \"clstats2\" -> \"generate_pdf\";\n\n  // Invisible links to force outputs to be at lowest rank\n  \"scatter_metrics\" -> \"qual_pdf\" [style=invis];\n  \"generate_pdf\" -> \"qual_pdf\" [style=invis];\n}\n", "permaLinkBase" : "https://w3id.org/cwl/view" }, "cwltoolStatus" : "RUNNING", "cwltoolVersion" : "" }
{ "_id" : ObjectId("5fd94ac82ab79c00011833fc"), "_class" : "org.commonwl.view.workflow.QueuedWorkflow", "tempRepresentation" : { "_id" : null, "retrievedFrom" : { "repoUrl" : "https://github.com/genome/analysis-workflows.git", "branch" : "509938802c5e42bb8084c6a5a26ab6425c60e69a", "path" : "definitions/subworkflows/bam_to_trimmed_fastq.cwl" }, "retrievedOn" : ISODate("2020-12-15T23:46:16.018Z"), "lastCommit" : "509938802c5e42bb8084c6a5a26ab6425c60e69a", "label" : "bam to trimmed fastqs", "inputs" : { "adapters" : { "type" : "File", "sourceID" : [ ] }, "min_readlength" : { "type" : "int", "sourceID" : [ ] }, "adapter_trim_end" : { "type" : "string", "sourceID" : [ ] }, "adapter_min_overlap" : { "type" : "int", "sourceID" : [ ] }, "max_uncalled" : { "type" : "int", "sourceID" : [ ] }, "bam" : { "type" : "File", "sourceID" : [ ] } }, "outputs" : { "fastqs" : { "type" : "File[]", "sourceID" : [ "trim_fastq" ] }, "fastq2" : { "type" : "File", "sourceID" : [ "trim_fastq" ] }, "fastq1" : { "type" : "File", "sourceID" : [ "trim_fastq" ] } }, "steps" : { "trim_fastq" : { "run" : "../tools/trim_fastq.cwl", "sources" : { "reads2" : { "sourceID" : [ "bam_to_fastq" ] }, "reads1" : { "sourceID" : [ "bam_to_fastq" ] }, "adapters" : { "sourceID" : [ "adapters" ] }, "min_readlength" : { "sourceID" : [ "min_readlength" ] }, "adapter_trim_end" : { "sourceID" : [ "adapter_trim_end" ] }, "adapter_min_overlap" : { "sourceID" : [ "adapter_min_overlap" ] }, "max_uncalled" : { "sourceID" : [ "max_uncalled" ] } } }, "bam_to_fastq" : { "run" : "../tools/bam_to_fastq.cwl", "sources" : { "bam" : { "sourceID" : [ "bam" ] } } } }, "cwltoolVersion" : "1.0.20180525185854", "visualisationDot" : "digraph workflow {\n  graph [\n    bgcolor = \"#eeeeee\"\n    color = \"black\"\n    fontsize = \"10\"\n    labeljust = \"left\"\n    clusterrank = \"local\"\n    ranksep = \"0.22\"\n    nodesep = \"0.05\"\n  ]\n  node [\n    fontname = \"Helvetica\"\n    fontsize = \"10\"\n    fontcolor = \"black\"\n    shape = \"record\"\n    height = \"0\"\n    width = \"0\"\n    color = \"black\"\n    fillcolor = \"lightgoldenrodyellow\"\n    style = \"filled\"\n  ];\n  edge [\n    fontname=\"Helvetica\"\n    fontsize=\"8\"\n    fontcolor=\"black\"\n    color=\"black\"\n    arrowsize=\"0.7\"\n  ];\n  subgraph cluster_inputs {\n    rank = \"same\";\n    style = \"dashed\";\n    label = \"Workflow Inputs\";\n    \"adapters\" [fillcolor=\"#94DDF4\"];\n    \"min_readlength\" [fillcolor=\"#94DDF4\"];\n    \"adapter_trim_end\" [fillcolor=\"#94DDF4\"];\n    \"adapter_min_overlap\" [fillcolor=\"#94DDF4\"];\n    \"max_uncalled\" [fillcolor=\"#94DDF4\"];\n    \"bam\" [fillcolor=\"#94DDF4\"];\n  }\n  subgraph cluster_outputs {\n    rank = \"same\";\n    style = \"dashed\";\n    label = \"Workflow Outputs\";\n    \"fastqs\" [fillcolor=\"#94DDF4\"];\n    \"fastq2\" [fillcolor=\"#94DDF4\"];\n    \"fastq1\" [fillcolor=\"#94DDF4\"];\n  }\n  \"trim_fastq\";\n  \"bam_to_fastq\";\n  \"trim_fastq\" -> \"fastqs\";\n  \"trim_fastq\" -> \"fastq2\";\n  \"trim_fastq\" -> \"fastq1\";\n  \"bam_to_fastq\" -> \"trim_fastq\";\n  \"bam_to_fastq\" -> \"trim_fastq\";\n  \"adapters\" -> \"trim_fastq\";\n  \"min_readlength\" -> \"trim_fastq\";\n  \"adapter_trim_end\" -> \"trim_fastq\";\n  \"adapter_min_overlap\" -> \"trim_fastq\";\n  \"max_uncalled\" -> \"trim_fastq\";\n  \"bam\" -> \"bam_to_fastq\";\n\n  // Invisible links to force outputs to be at lowest rank\n  \"trim_fastq\" -> \"fastqs\" [style=invis];\n  \"bam_to_fastq\" -> \"fastqs\" [style=invis];\n}\n", "permaLinkBase" : "https://w3id.org/cwl/view" }, "cwltoolStatus" : "RUNNING", "cwltoolVersion" : "" }
{ "_id" : ObjectId("5fd94b342ab79c0001183400"), "_class" : "org.commonwl.view.workflow.QueuedWorkflow", "tempRepresentation" : { "_id" : null, "retrievedFrom" : { "repoUrl" : "https://github.com/genome/analysis-workflows.git", "branch" : "b465f0da2806ddb6df481409541d13288ccb40ec", "path" : "definitions/subworkflows/sequence_to_bqsr.cwl" }, "retrievedOn" : ISODate("2020-12-15T23:48:04.255Z"), "lastCommit" : "b465f0da2806ddb6df481409541d13288ccb40ec", "label" : "Raw sequence data to BQSR", "inputs" : { "reference" : { "type" : "string, File", "sourceID" : [ ] }, "bqsr_intervals" : { "type" : "string[]?", "sourceID" : [ ] }, "trimming" : { "type" : "../types/trimming_options.yml#trimming_options?", "sourceID" : [ ] }, "bqsr_known_sites" : { "doc" : "One or more databases of known polymorphic sites used to exclude regions around known polymorphisms from analysis.", "type" : "File[]", "sourceID" : [ ] }, "unaligned" : { "type" : "../types/sequence_data.yml#sequence_data[]", "sourceID" : [ ] }, "final_name" : { "type" : "string", "sourceID" : [ ], "defaultVal" : "\\\"final\\\"" } }, "outputs" : { "final_bam" : { "type" : "File", "sourceID" : [ "index_bam" ] }, "mark_duplicates_metrics_file" : { "type" : "File", "sourceID" : [ "mark_duplicates_and_sort" ] } }, "steps" : { "bqsr" : { "run" : "../tools/bqsr.cwl", "sources" : { "reference" : { "sourceID" : [ "reference" ] }, "intervals" : { "sourceID" : [ "bqsr_intervals" ] }, "known_sites" : { "sourceID" : [ "bqsr_known_sites" ] }, "bam" : { "sourceID" : [ "mark_duplicates_and_sort" ] } } }, "index_bam" : { "run" : "../tools/index_bam.cwl", "sources" : { "bam" : { "sourceID" : [ "apply_bqsr" ] } } }, "apply_bqsr" : { "run" : "../tools/apply_bqsr.cwl", "sources" : { "reference" : { "sourceID" : [ "reference" ] }, "output_name" : { "sourceID" : [ "final_name" ] }, "bqsr_table" : { "sourceID" : [ "bqsr" ] }, "bam" : { "sourceID" : [ "mark_duplicates_and_sort" ] } } }, "mark_duplicates_and_sort" : { "run" : "../tools/mark_duplicates_and_sort.cwl", "sources" : { "bam" : { "sourceID" : [ "name_sort" ] } } }, "merge" : { "run" : "../tools/merge_bams_samtools.cwl", "sources" : { "name" : { "sourceID" : [ "final_name" ] }, "bams" : { "sourceID" : [ "align" ] } } }, "name_sort" : { "run" : "../tools/name_sort.cwl", "sources" : { "bam" : { "sourceID" : [ "merge" ] } } }, "align" : { "run" : "sequence_align_and_tag_adapter.cwl", "sources" : { "reference" : { "sourceID" : [ "reference" ] }, "trimming" : { "sourceID" : [ "trimming" ] }, "unaligned" : { "sourceID" : [ "unaligned" ] } } } }, "cwltoolVersion" : "1.0.20180525185854", "visualisationDot" : "digraph workflow {\n  graph [\n    bgcolor = \"#eeeeee\"\n    color = \"black\"\n    fontsize = \"10\"\n    labeljust = \"left\"\n    clusterrank = \"local\"\n    ranksep = \"0.22\"\n    nodesep = \"0.05\"\n  ]\n  node [\n    fontname = \"Helvetica\"\n    fontsize = \"10\"\n    fontcolor = \"black\"\n    shape = \"record\"\n    height = \"0\"\n    width = \"0\"\n    color = \"black\"\n    fillcolor = \"lightgoldenrodyellow\"\n    style = \"filled\"\n  ];\n  edge [\n    fontname=\"Helvetica\"\n    fontsize=\"8\"\n    fontcolor=\"black\"\n    color=\"black\"\n    arrowsize=\"0.7\"\n  ];\n  subgraph cluster_inputs {\n    rank = \"same\";\n    style = \"dashed\";\n    label = \"Workflow Inputs\";\n    \"reference\" [fillcolor=\"#94DDF4\"];\n    \"bqsr_intervals\" [fillcolor=\"#94DDF4\"];\n    \"trimming\" [fillcolor=\"#94DDF4\"];\n    \"bqsr_known_sites\" [fillcolor=\"#94DDF4\"];\n    \"unaligned\" [fillcolor=\"#94DDF4\"];\n    \"final_name\" [fillcolor=\"#94DDF4\"];\n  }\n  subgraph cluster_outputs {\n    rank = \"same\";\n    style = \"dashed\";\n    label = \"Workflow Outputs\";\n    \"final_bam\" [fillcolor=\"#94DDF4\"];\n    \"mark_duplicates_metrics_file\" [fillcolor=\"#94DDF4\"];\n  }\n  \"bqsr\";\n  \"index_bam\";\n  \"apply_bqsr\";\n  \"mark_duplicates_and_sort\";\n  \"merge\";\n  \"name_sort\";\n  \"align\";\n  \"index_bam\" -> \"final_bam\";\n  \"mark_duplicates_and_sort\" -> \"mark_duplicates_metrics_file\";\n  \"reference\" -> \"bqsr\";\n  \"bqsr_intervals\" -> \"bqsr\";\n  \"bqsr_known_sites\" -> \"bqsr\";\n  \"mark_duplicates_and_sort\" -> \"bqsr\";\n  \"apply_bqsr\" -> \"index_bam\";\n  \"reference\" -> \"apply_bqsr\";\n  \"final_name\" -> \"apply_bqsr\";\n  \"bqsr\" -> \"apply_bqsr\";\n  \"mark_duplicates_and_sort\" -> \"apply_bqsr\";\n  \"name_sort\" -> \"mark_duplicates_and_sort\";\n  \"final_name\" -> \"merge\";\n  \"align\" -> \"merge\";\n  \"merge\" -> \"name_sort\";\n  \"reference\" -> \"align\";\n  \"trimming\" -> \"align\";\n  \"unaligned\" -> \"align\";\n\n  // Invisible links to force outputs to be at lowest rank\n  \"bqsr\" -> \"final_bam\" [style=invis];\n  \"index_bam\" -> \"final_bam\" [style=invis];\n  \"apply_bqsr\" -> \"final_bam\" [style=invis];\n  \"mark_duplicates_and_sort\" -> \"final_bam\" [style=invis];\n  \"merge\" -> \"final_bam\" [style=invis];\n  \"name_sort\" -> \"final_bam\" [style=invis];\n  \"align\" -> \"final_bam\" [style=invis];\n}\n", "permaLinkBase" : "https://w3id.org/cwl/view" }, "cwltoolStatus" : "RUNNING", "cwltoolVersion" : "" }
{ "_id" : ObjectId("5fd94b342ab79c0001183401"), "_class" : "org.commonwl.view.workflow.QueuedWorkflow", "tempRepresentation" : { "_id" : null, "retrievedFrom" : { "repoUrl" : "https://github.com/ncbi/pgap.git", "branch" : "master", "path" : "bacterial_mobile_elem/wf_bacterial_mobile_elem.cwl" }, "retrievedOn" : ISODate("2020-12-15T23:48:04.835Z"), "lastCommit" : "9ff3e17888a15f4691ba82380472317214e20a1c", "label" : "Execute CRISPR", "inputs" : { "asn_cache" : { "type" : "Directory", "sourceID" : [ ] }, "go" : { "type" : "boolean[]", "sourceID" : [ ] }, "seqids" : { "type" : "File", "sourceID" : [ ] } }, "outputs" : { "annots" : { "type" : "File", "sourceID" : [ "Execute_CRISPR_dump" ] } }, "steps" : { "Execute_CRISPR_wnode" : { "run" : "ncbi_crisper_wnode.cwl", "sources" : { "asn_cache" : { "sourceID" : [ "asn_cache" ] }, "input_jobs" : { "sourceID" : [ "Execute_CRISPR_submit" ] } } }, "Execute_CRISPR_submit" : { "run" : "gpx_qsubmit.cwl", "sources" : { "asn_cache" : { "sourceID" : [ "asn_cache" ] }, "seqids" : { "sourceID" : [ "seqids" ] } } }, "Execute_CRISPR_dump" : { "run" : "gpx_qdump.cwl", "sources" : { "input_path" : { "sourceID" : [ "Execute_CRISPR_wnode" ] } } } }, "cwltoolVersion" : "1.0.20180525185854", "visualisationDot" : "digraph workflow {\n  graph [\n    bgcolor = \"#eeeeee\"\n    color = \"black\"\n    fontsize = \"10\"\n    labeljust = \"left\"\n    clusterrank = \"local\"\n    ranksep = \"0.22\"\n    nodesep = \"0.05\"\n  ]\n  node [\n    fontname = \"Helvetica\"\n    fontsize = \"10\"\n    fontcolor = \"black\"\n    shape = \"record\"\n    height = \"0\"\n    width = \"0\"\n    color = \"black\"\n    fillcolor = \"lightgoldenrodyellow\"\n    style = \"filled\"\n  ];\n  edge [\n    fontname=\"Helvetica\"\n    fontsize=\"8\"\n    fontcolor=\"black\"\n    color=\"black\"\n    arrowsize=\"0.7\"\n  ];\n  subgraph cluster_inputs {\n    rank = \"same\";\n    style = \"dashed\";\n    label = \"Workflow Inputs\";\n    \"asn_cache\" [fillcolor=\"#94DDF4\"];\n    \"go\" [fillcolor=\"#94DDF4\"];\n    \"seqids\" [fillcolor=\"#94DDF4\"];\n  }\n  subgraph cluster_outputs {\n    rank = \"same\";\n    style = \"dashed\";\n    label = \"Workflow Outputs\";\n    \"annots\" [fillcolor=\"#94DDF4\"];\n  }\n  \"Execute_CRISPR_wnode\";\n  \"Execute_CRISPR_submit\";\n  \"Execute_CRISPR_dump\";\n  \"Execute_CRISPR_dump\" -> \"annots\";\n  \"asn_cache\" -> \"Execute_CRISPR_wnode\";\n  \"Execute_CRISPR_submit\" -> \"Execute_CRISPR_wnode\";\n  \"asn_cache\" -> \"Execute_CRISPR_submit\";\n  \"seqids\" -> \"Execute_CRISPR_submit\";\n  \"Execute_CRISPR_wnode\" -> \"Execute_CRISPR_dump\";\n\n  // Invisible links to force outputs to be at lowest rank\n  \"Execute_CRISPR_wnode\" -> \"annots\" [style=invis];\n  \"Execute_CRISPR_submit\" -> \"annots\" [style=invis];\n  \"Execute_CRISPR_dump\" -> \"annots\" [style=invis];\n}\n", "permaLinkBase" : "https://w3id.org/cwl/view" }, "cwltoolStatus" : "RUNNING", "cwltoolVersion" : "" }
{ "_id" : ObjectId("5fd94b352ab79c0001183402"), "_class" : "org.commonwl.view.workflow.QueuedWorkflow", "tempRepresentation" : { "_id" : null, "retrievedFrom" : { "repoUrl" : "https://github.com/Duke-GCB/bespin-cwl.git", "branch" : "master", "path" : "subworkflows/exomeseq-gatk4-01-preprocessing.cwl" }, "retrievedOn" : ISODate("2020-12-15T23:48:05.835Z"), "lastCommit" : "bbe24d8d7fde2e918583b96805909a2867b749d6", "label" : "exomeseq-gatk4-01-preprocessing.cwl", "inputs" : { "intervals" : { "type" : "File[]?", "sourceID" : [ ] }, "interval_padding" : { "type" : "int?", "sourceID" : [ ] }, "read_pair" : { "type" : "../types/FASTQReadPairType.yml#FASTQReadPairType", "sourceID" : [ ] }, "library" : { "type" : "string", "sourceID" : [ ] }, "reference_genome" : { "type" : "File", "sourceID" : [ ] }, "bait_interval_list" : { "type" : "File", "sourceID" : [ ] }, "threads" : { "type" : "int", "sourceID" : [ ] }, "target_interval_list" : { "type" : "File", "sourceID" : [ ] }, "known_sites" : { "type" : "File[]", "sourceID" : [ ] }, "resource_dbsnp" : { "type" : "File", "sourceID" : [ ] }, "platform" : { "type" : "string", "sourceID" : [ ] } }, "outputs" : { "recalibrated_reads" : { "type" : "File", "sourceID" : [ "recalibrate_02_apply_bqsr" ] }, "raw_variants" : { "doc" : "VCF file from per sample variant calling", "type" : "File", "sourceID" : [ "variant_calling" ] }, "trim_reports" : { "type" : "File[]", "sourceID" : [ "trim" ] }, "markduplicates_bam" : { "type" : "File", "sourceID" : [ "mark_duplicates" ] }, "recalibration_table" : { "type" : "File", "sourceID" : [ "recalibrate_01_analyze" ] }, "haplotypes_bam" : { "doc" : "BAM file containing assembled haplotypes and locally realigned reads", "type" : "File", "sourceID" : [ "variant_calling" ] }, "fastqc_reports" : { "type" : "File[]", "sourceID" : [ "qc" ] } }, "steps" : { "qc" : { "run" : "../tools/fastqc.cwl", "sources" : { "input_fastq_file" : { "sourceID" : [ "combine_reads" ] }, "threads" : { "sourceID" : [ ], "defaultVal" : "\\\"4\\\"" } } }, "file_pair_details" : { "run" : "../tools/extract-named-file-pair-details.cwl", "sources" : { "read_pair" : { "sourceID" : [ "read_pair" ] }, "library" : { "sourceID" : [ "library" ] }, "platform" : { "sourceID" : [ "platform" ] } } }, "trim" : { "run" : "../tools/trim_galore.cwl", "sources" : { "reads" : { "sourceID" : [ "combine_reads" ] }, "paired" : { "sourceID" : [ ], "defaultVal" : "\\\"true\\\"" } } }, "recalibrate_02_apply_bqsr" : { "run" : "../tools/GATK4-ApplyBQSR.cwl", "sources" : { "reference" : { "sourceID" : [ "reference_genome" ] }, "intervals" : { "sourceID" : [ "intervals" ] }, "interval_padding" : { "sourceID" : [ "interval_padding" ] }, "bqsr_report" : { "sourceID" : [ "recalibrate_01_analyze" ] }, "output_recalibrated_bam_filename" : { "sourceID" : [ "generate_sample_filenames" ] }, "add_output_sam_program_record" : { "sourceID" : [ ], "defaultVal" : "\\\"true\\\"" }, "input_bam" : { "sourceID" : [ "mark_duplicates" ] }, "java_opt" : { "sourceID" : [ ], "defaultVal" : "\\\"-Xms3g\\\"" } } }, "combine_reads" : { "run" : "../tools/concat-gz-files.cwl", "sources" : { "output_filename" : { "sourceID" : [ "generate_sample_filenames" ] }, "files" : { "sourceID" : [ "file_pair_details" ] } } }, "variant_calling" : { "run" : "../tools/GATK4-HaplotypeCaller.cwl", "sources" : { "reference" : { "sourceID" : [ "reference_genome" ] }, "output_variants_filename" : { "sourceID" : [ "generate_sample_filenames" ] }, "output_bam_filename" : { "sourceID" : [ "generate_sample_filenames" ] }, "intervals" : { "sourceID" : [ "intervals" ] }, "interval_padding" : { "sourceID" : [ "interval_padding" ] }, "annotation_groups" : { "sourceID" : [ ], "defaultVal" : "\\\"\\\"" }, "input_bam" : { "sourceID" : [ "recalibrate_02_apply_bqsr" ] }, "emit_ref_confidence" : { "sourceID" : [ ], "defaultVal" : "\\\"GVCF\\\"" }, "java_opt" : { "sourceID" : [ ], "defaultVal" : "\\\"-Xms7g\\\"" } } }, "generate_sample_filenames" : { "run" : "../tools/generate-sample-filenames.cwl", "sources" : { "sample_name" : { "sourceID" : [ "file_pair_details" ] } } }, "sort" : { "run" : "../tools/GATK4-SortSam.cwl", "sources" : { "input_file" : { "sourceID" : [ "map" ] }, "output_sorted_bam_filename" : { "sourceID" : [ "generate_sample_filenames" ] }, "sort_order" : { "sourceID" : [ ], "defaultVal" : "\\\"coordinate\\\"" }, "java_opt" : { "sourceID" : [ ], "defaultVal" : "\\\"-Xms4g\\\"" } } }, "mark_duplicates" : { "run" : "../tools/GATK4-MarkDuplicates.cwl", "sources" : { "assume_sort_order" : { "sourceID" : [ ], "defaultVal" : "\\\"coordinate\\\"" }, "input_file" : { "sourceID" : [ "sort" ] }, "validation_stringency" : { "sourceID" : [ ], "defaultVal" : "\\\"SILENT\\\"" }, "output_filename" : { "sourceID" : [ "generate_sample_filenames" ] }, "create_index" : { "sourceID" : [ ], "defaultVal" : "\\\"true\\\"" }, "metrics_filename" : { "sourceID" : [ "generate_sample_filenames" ] }, "optical_duplicate_pixel_distance" : { "sourceID" : [ ], "defaultVal" : "\\\"2500\\\"" }, "java_opt" : { "sourceID" : [ ], "defaultVal" : "\\\"-Xms4g\\\"" } } }, "recalibrate_01_analyze" : { "run" : "../tools/GATK4-BaseRecalibrator.cwl", "sources" : { "reference" : { "sourceID" : [ "reference_genome" ] }, "intervals" : { "sourceID" : [ "intervals" ] }, "interval_padding" : { "sourceID" : [ "interval_padding" ] }, "output_recalibration_report_filename" : { "sourceID" : [ "generate_sample_filenames" ] }, "input_bam" : { "sourceID" : [ "mark_duplicates" ] }, "known_sites" : { "sourceID" : [ "known_sites" ] }, "java_opt" : { "sourceID" : [ ], "defaultVal" : "\\\"-Xms4g\\\"" } } }, "map" : { "run" : "../tools/gitc-bwa-mem-samtools.cwl", "sources" : { "reference" : { "sourceID" : [ "reference_genome" ] }, "reads" : { "sourceID" : [ "trim" ] }, "output_filename" : { "sourceID" : [ "generate_sample_filenames" ] }, "read_group_header" : { "sourceID" : [ "file_pair_details" ] }, "threads" : { "sourceID" : [ "threads" ] } } } }, "cwltoolVersion" : "1.0.20180525185854", "visualisationDot" : "digraph workflow {\n  graph [\n    bgcolor = \"#eeeeee\"\n    color = \"black\"\n    fontsize = \"10\"\n    labeljust = \"left\"\n    clusterrank = \"local\"\n    ranksep = \"0.22\"\n    nodesep = \"0.05\"\n  ]\n  node [\n    fontname = \"Helvetica\"\n    fontsize = \"10\"\n    fontcolor = \"black\"\n    shape = \"record\"\n    height = \"0\"\n    width = \"0\"\n    color = \"black\"\n    fillcolor = \"lightgoldenrodyellow\"\n    style = \"filled\"\n  ];\n  edge [\n    fontname=\"Helvetica\"\n    fontsize=\"8\"\n    fontcolor=\"black\"\n    color=\"black\"\n    arrowsize=\"0.7\"\n  ];\n  subgraph cluster_inputs {\n    rank = \"same\";\n    style = \"dashed\";\n    label = \"Workflow Inputs\";\n    \"intervals\" [fillcolor=\"#94DDF4\"];\n    \"interval_padding\" [fillcolor=\"#94DDF4\"];\n    \"read_pair\" [fillcolor=\"#94DDF4\"];\n    \"library\" [fillcolor=\"#94DDF4\"];\n    \"reference_genome\" [fillcolor=\"#94DDF4\"];\n    \"bait_interval_list\" [fillcolor=\"#94DDF4\"];\n    \"threads\" [fillcolor=\"#94DDF4\"];\n    \"target_interval_list\" [fillcolor=\"#94DDF4\"];\n    \"known_sites\" [fillcolor=\"#94DDF4\"];\n    \"resource_dbsnp\" [fillcolor=\"#94DDF4\"];\n    \"platform\" [fillcolor=\"#94DDF4\"];\n  }\n  subgraph cluster_outputs {\n    rank = \"same\";\n    style = \"dashed\";\n    label = \"Workflow Outputs\";\n    \"recalibrated_reads\" [fillcolor=\"#94DDF4\"];\n    \"raw_variants\" [fillcolor=\"#94DDF4\"];\n    \"trim_reports\" [fillcolor=\"#94DDF4\"];\n    \"markduplicates_bam\" [fillcolor=\"#94DDF4\"];\n    \"recalibration_table\" [fillcolor=\"#94DDF4\"];\n    \"haplotypes_bam\" [fillcolor=\"#94DDF4\"];\n    \"fastqc_reports\" [fillcolor=\"#94DDF4\"];\n  }\n  \"qc\";\n  \"file_pair_details\";\n  \"trim\";\n  \"recalibrate_02_apply_bqsr\";\n  \"combine_reads\";\n  \"variant_calling\";\n  \"generate_sample_filenames\";\n  \"sort\";\n  \"mark_duplicates\";\n  \"recalibrate_01_analyze\";\n  \"map\";\n  \"recalibrate_02_apply_bqsr\" -> \"recalibrated_reads\";\n  \"variant_calling\" -> \"raw_variants\";\n  \"trim\" -> \"trim_reports\";\n  \"mark_duplicates\" -> \"markduplicates_bam\";\n  \"recalibrate_01_analyze\" -> \"recalibration_table\";\n  \"variant_calling\" -> \"haplotypes_bam\";\n  \"qc\" -> \"fastqc_reports\";\n  \"combine_reads\" -> \"qc\";\n  \"default1\" [label=\"\\\"4\\\"\", fillcolor=\"#D5AEFC\"]\n  \"default1\" -> \"qc\";\n  \"read_pair\" -> \"file_pair_details\";\n  \"library\" -> \"file_pair_details\";\n  \"platform\" -> \"file_pair_details\";\n  \"combine_reads\" -> \"trim\";\n  \"default2\" [label=\"\\\"true\\\"\", fillcolor=\"#D5AEFC\"]\n  \"default2\" -> \"trim\";\n  \"reference_genome\" -> \"recalibrate_02_apply_bqsr\";\n  \"intervals\" -> \"recalibrate_02_apply_bqsr\";\n  \"interval_padding\" -> \"recalibrate_02_apply_bqsr\";\n  \"recalibrate_01_analyze\" -> \"recalibrate_02_apply_bqsr\";\n  \"generate_sample_filenames\" -> \"recalibrate_02_apply_bqsr\";\n  \"default3\" [label=\"\\\"true\\\"\", fillcolor=\"#D5AEFC\"]\n  \"default3\" -> \"recalibrate_02_apply_bqsr\";\n  \"mark_duplicates\" -> \"recalibrate_02_apply_bqsr\";\n  \"default4\" [label=\"\\\"-Xms3g\\\"\", fillcolor=\"#D5AEFC\"]\n  \"default4\" -> \"recalibrate_02_apply_bqsr\";\n  \"generate_sample_filenames\" -> \"combine_reads\";\n  \"file_pair_details\" -> \"combine_reads\";\n  \"reference_genome\" -> \"variant_calling\";\n  \"generate_sample_filenames\" -> \"variant_calling\";\n  \"generate_sample_filenames\" -> \"variant_calling\";\n  \"intervals\" -> \"variant_calling\";\n  \"interval_padding\" -> \"variant_calling\";\n  \"default5\" [label=\"\\\"\\\"\", fillcolor=\"#D5AEFC\"]\n  \"default5\" -> \"variant_calling\";\n  \"recalibrate_02_apply_bqsr\" -> \"variant_calling\";\n  \"default6\" [label=\"\\\"GVCF\\\"\", fillcolor=\"#D5AEFC\"]\n  \"default6\" -> \"variant_calling\";\n  \"default7\" [label=\"\\\"-Xms7g\\\"\", fillcolor=\"#D5AEFC\"]\n  \"default7\" -> \"variant_calling\";\n  \"file_pair_details\" -> \"generate_sample_filenames\";\n  \"map\" -> \"sort\";\n  \"generate_sample_filenames\" -> \"sort\";\n  \"default8\" [label=\"\\\"coordinate\\\"\", fillcolor=\"#D5AEFC\"]\n  \"default8\" -> \"sort\";\n  \"default9\" [label=\"\\\"-Xms4g\\\"\", fillcolor=\"#D5AEFC\"]\n  \"default9\" -> \"sort\";\n  \"default10\" [label=\"\\\"coordinate\\\"\", fillcolor=\"#D5AEFC\"]\n  \"default10\" -> \"mark_duplicates\";\n  \"sort\" -> \"mark_duplicates\";\n  \"default11\" [label=\"\\\"SILENT\\\"\", fillcolor=\"#D5AEFC\"]\n  \"default11\" -> \"mark_duplicates\";\n  \"generate_sample_filenames\" -> \"mark_duplicates\";\n  \"default12\" [label=\"\\\"true\\\"\", fillcolor=\"#D5AEFC\"]\n  \"default12\" -> \"mark_duplicates\";\n  \"generate_sample_filenames\" -> \"mark_duplicates\";\n  \"default13\" [label=\"\\\"2500\\\"\", fillcolor=\"#D5AEFC\"]\n  \"default13\" -> \"mark_duplicates\";\n  \"default14\" [label=\"\\\"-Xms4g\\\"\", fillcolor=\"#D5AEFC\"]\n  \"default14\" -> \"mark_duplicates\";\n  \"reference_genome\" -> \"recalibrate_01_analyze\";\n  \"intervals\" -> \"recalibrate_01_analyze\";\n  \"interval_padding\" -> \"recalibrate_01_analyze\";\n  \"generate_sample_filenames\" -> \"recalibrate_01_analyze\";\n  \"mark_duplicates\" -> \"recalibrate_01_analyze\";\n  \"known_sites\" -> \"recalibrate_01_analyze\";\n  \"default15\" [label=\"\\\"-Xms4g\\\"\", fillcolor=\"#D5AEFC\"]\n  \"default15\" -> \"recalibrate_01_analyze\";\n  \"reference_genome\" -> \"map\";\n  \"trim\" -> \"map\";\n  \"generate_sample_filenames\" -> \"map\";\n  \"file_pair_details\" -> \"map\";\n  \"threads\" -> \"map\";\n\n  // Invisible links to force outputs to be at lowest rank\n  \"qc\" -> \"recalibrated_reads\" [style=invis];\n  \"file_pair_details\" -> \"recalibrated_reads\" [style=invis];\n  \"trim\" -> \"recalibrated_reads\" [style=invis];\n  \"recalibrate_02_apply_bqsr\" -> \"recalibrated_reads\" [style=invis];\n  \"combine_reads\" -> \"recalibrated_reads\" [style=invis];\n  \"variant_calling\" -> \"recalibrated_reads\" [style=invis];\n  \"generate_sample_filenames\" -> \"recalibrated_reads\" [style=invis];\n  \"sort\" -> \"recalibrated_reads\" [style=invis];\n  \"mark_duplicates\" -> \"recalibrated_reads\" [style=invis];\n  \"recalibrate_01_analyze\" -> \"recalibrated_reads\" [style=invis];\n  \"map\" -> \"recalibrated_reads\" [style=invis];\n}\n", "permaLinkBase" : "https://w3id.org/cwl/view" }, "cwltoolStatus" : "RUNNING", "cwltoolVersion" : "" }
{ "_id" : ObjectId("5fd94b352ab79c0001183403"), "_class" : "org.commonwl.view.workflow.QueuedWorkflow", "tempRepresentation" : { "_id" : null, "retrievedFrom" : { "repoUrl" : "https://github.com/rosafilgueira/cyclon_usecase.git", "branch" : "d6d8c3b03c66e7594561c22c1de80eda2113b277", "path" : "run_cyclon/tracking_master.cwl" }, "retrievedOn" : ISODate("2020-12-15T23:48:05.967Z"), "lastCommit" : "d6d8c3b03c66e7594561c22c1de80eda2113b277", "label" : "tracking_master.cwl", "inputs" : { "script_transferfiles" : { "type" : "File", "sourceID" : [ ] }, "script_xml2ascii" : { "type" : "File", "sourceID" : [ ] }, "script_postprocess" : { "type" : "File", "sourceID" : [ ] }, "script_processfiles" : { "type" : "File", "sourceID" : [ ] }, "script_extractnc" : { "type" : "File", "sourceID" : [ ] }, "script_environment" : { "type" : "File", "sourceID" : [ ] }, "script_make_tracks" : { "type" : "File", "sourceID" : [ ] } }, "outputs" : { "results" : { "type" : "Directory", "sourceID" : [ "postprocess" ] } }, "steps" : { "make_tracks" : { "run" : "make_tracks.cwl", "sources" : { "cyclon" : { "sourceID" : [ "extractnc" ] }, "script" : { "sourceID" : [ "script_make_tracks" ] } } }, "xml2ascii" : { "run" : "xml2ascii.cwl", "sources" : { "cyclon" : { "sourceID" : [ "make_tracks" ] }, "script" : { "sourceID" : [ "script_xml2ascii" ] } } }, "processfiles" : { "run" : "processfiles.cwl", "sources" : { "cyclon" : { "sourceID" : [ "create_environment" ] }, "script" : { "sourceID" : [ "script_processfiles" ] } } }, "postprocess" : { "run" : "postprocess.cwl", "sources" : { "cyclon" : { "sourceID" : [ "xml2ascii" ] }, "script" : { "sourceID" : [ "script_postprocess" ] } } }, "extractnc" : { "run" : "extractnc.cwl", "sources" : { "cyclon" : { "sourceID" : [ "transferfiles" ] }, "script" : { "sourceID" : [ "script_extractnc" ] } } }, "create_environment" : { "run" : "env_preparation.cwl", "sources" : { "script" : { "sourceID" : [ "script_environment" ] } } }, "transferfiles" : { "run" : "transferfiles.cwl", "sources" : { "cyclon" : { "sourceID" : [ "processfiles" ] }, "script" : { "sourceID" : [ "script_transferfiles" ] } } } }, "cwltoolVersion" : "1.0.20180525185854", "visualisationDot" : "digraph workflow {\n  graph [\n    bgcolor = \"#eeeeee\"\n    color = \"black\"\n    fontsize = \"10\"\n    labeljust = \"left\"\n    clusterrank = \"local\"\n    ranksep = \"0.22\"\n    nodesep = \"0.05\"\n  ]\n  node [\n    fontname = \"Helvetica\"\n    fontsize = \"10\"\n    fontcolor = \"black\"\n    shape = \"record\"\n    height = \"0\"\n    width = \"0\"\n    color = \"black\"\n    fillcolor = \"lightgoldenrodyellow\"\n    style = \"filled\"\n  ];\n  edge [\n    fontname=\"Helvetica\"\n    fontsize=\"8\"\n    fontcolor=\"black\"\n    color=\"black\"\n    arrowsize=\"0.7\"\n  ];\n  subgraph cluster_inputs {\n    rank = \"same\";\n    style = \"dashed\";\n    label = \"Workflow Inputs\";\n    \"script_transferfiles\" [fillcolor=\"#94DDF4\"];\n    \"script_xml2ascii\" [fillcolor=\"#94DDF4\"];\n    \"script_postprocess\" [fillcolor=\"#94DDF4\"];\n    \"script_processfiles\" [fillcolor=\"#94DDF4\"];\n    \"script_extractnc\" [fillcolor=\"#94DDF4\"];\n    \"script_environment\" [fillcolor=\"#94DDF4\"];\n    \"script_make_tracks\" [fillcolor=\"#94DDF4\"];\n  }\n  subgraph cluster_outputs {\n    rank = \"same\";\n    style = \"dashed\";\n    label = \"Workflow Outputs\";\n    \"results\" [fillcolor=\"#94DDF4\"];\n  }\n  \"make_tracks\";\n  \"xml2ascii\";\n  \"processfiles\";\n  \"postprocess\";\n  \"extractnc\";\n  \"create_environment\";\n  \"transferfiles\";\n  \"postprocess\" -> \"results\";\n  \"extractnc\" -> \"make_tracks\";\n  \"script_make_tracks\" -> \"make_tracks\";\n  \"make_tracks\" -> \"xml2ascii\";\n  \"script_xml2ascii\" -> \"xml2ascii\";\n  \"create_environment\" -> \"processfiles\";\n  \"script_processfiles\" -> \"processfiles\";\n  \"xml2ascii\" -> \"postprocess\";\n  \"script_postprocess\" -> \"postprocess\";\n  \"transferfiles\" -> \"extractnc\";\n  \"script_extractnc\" -> \"extractnc\";\n  \"script_environment\" -> \"create_environment\";\n  \"processfiles\" -> \"transferfiles\";\n  \"script_transferfiles\" -> \"transferfiles\";\n\n  // Invisible links to force outputs to be at lowest rank\n  \"make_tracks\" -> \"results\" [style=invis];\n  \"xml2ascii\" -> \"results\" [style=invis];\n  \"processfiles\" -> \"results\" [style=invis];\n  \"postprocess\" -> \"results\" [style=invis];\n  \"extractnc\" -> \"results\" [style=invis];\n  \"create_environment\" -> \"results\" [style=invis];\n  \"transferfiles\" -> \"results\" [style=invis];\n}\n", "permaLinkBase" : "https://w3id.org/cwl/view" }, "cwltoolStatus" : "RUNNING", "cwltoolVersion" : "" }
> db.queuedWorkflow.deleteMany( { cwltoolStatus: "RUNNING" } )
{ "acknowledged" : true, "deletedCount" : 6 }
> db.queuedWorkflow.find( { cwltoolStatus: "RUNNING" } ).count()
0

And then I restarted the import. Hopefully that will resolve the issue.

.... quite a while later, and after I resized the data partition to 1TB (temporarily) to avoid having to stop and restart docker too many times:

...
Trimmed queue from 1 to 1
Still running https://cwlviewer.arvados.org/queue/5ff4be662ab79c000178c747
Trimmed queue from 1 to 1
Still running https://cwlviewer.arvados.org/queue/5ff4be662ab79c000178c747
Trimmed queue from 1 to 1
Still running https://cwlviewer.arvados.org/queue/5ff4be662ab79c000178c747
Trimmed queue from 1 to 1
Still running https://cwlviewer.arvados.org/queue/5ff4be662ab79c000178c747
Trimmed queue from 1 to 1
Still running https://cwlviewer.arvados.org/queue/5ff4be662ab79c000178c747
Trimmed queue from 1 to 1
Still running https://cwlviewer.arvados.org/queue/5ff4be662ab79c000178c747
Trimmed queue from 1 to 1
Trimmed queue from 1 to 0

real    191m42.084s
user    13m34.416s
sys     0m26.841s

We now have 1845 workflows available. Hmm, there are 22535 workflows at view.commonwl.org.

The dump I was importing says it had 21836 elements. I made a new one just now, and that one has 22535:

"totalElements": 21836,
"totalPages": 11
"totalElements": 22535,
"totalPages": 12

Hmm. That's quite a far cry from 1845. I've kicked off an import for the new export.

Meanwhile, looking at the import that got us to 1845, I'm seeing only 27 outright failures:

Failed https://cwlviewer.arvados.org/queue/5fd8e89308813b0001ca6b51: Tool definition failed validation:
Failed https://cwlviewer.arvados.org/queue/5fd909ed2ab79c0001182cb3: Could not load extension schema https://schema.org/docs/schema_org_rdfa.html: None:11:92: Repeat node-elements inside property elements: http://www.w3.org/1999/xhtmlmeta
Failed https://cwlviewer.arvados.org/queue/5fd968e808813b00010be641: Tool definition failed validation:
Failed https://cwlviewer.arvados.org/queue/5fd9127b2ab79c0001182d8c: Tool definition failed validation:
Failed https://cwlviewer.arvados.org/queue/5fd918e92ab79c0001182e36: Tool definition failed validation:
Failed https://cwlviewer.arvados.org/queue/5fd92cfc2ab79c0001183050: Tool definition failed validation:
Failed https://cwlviewer.arvados.org/queue/5fd92e672ab79c0001183085: Tool definition failed validation:
Failed https://cwlviewer.arvados.org/queue/5fd9690508813b00010be642: Workflow checker warning:
Failed https://cwlviewer.arvados.org/queue/5fd7bbda2ab79c0001a0342c: Tool definition failed validation:
Failed https://cwlviewer.arvados.org/queue/5fd93a9c2ab79c00011831f0: Tool definition failed validation:
Failed https://cwlviewer.arvados.org/queue/5fd93bf52ab79c0001183221: Tool definition failed validation:
Failed https://cwlviewer.arvados.org/queue/5fd93c342ab79c000118322e: Tool definition failed validation:
Failed https://cwlviewer.arvados.org/queue/5fd93c852ab79c000118323d: Tool definition failed validation:
Failed https://cwlviewer.arvados.org/queue/5fd946542ab79c0001183390: Tool definition failed validation:
Failed https://cwlviewer.arvados.org/queue/5fd9691708813b00010be643: I'm sorry, I couldn't load this CWL file, try again with --debug for more information.
Failed https://cwlviewer.arvados.org/queue/5ff4968f2ab79c000178c249: Whoops! Cwltool ran successfully, but an unexpected error occurred in CWLViewer!
Failed https://cwlviewer.arvados.org/queue/5ff497ad2ab79c000178c26a: Tool definition failed validation:
Failed https://cwlviewer.arvados.org/queue/5ff49ca72ab79c000178c321: Tool definition failed validation:
Failed https://cwlviewer.arvados.org/queue/5ff49ca42ab79c000178c31f: Could not load extension schema http://schema.org/version/latest/schema.rdf: ('http://schema.org/version/latest/schema.rdf', HTTPError('404 Client Error: Not Found for url: https://schema.org/version/latest/schema.rdf',))
Failed https://cwlviewer.arvados.org/queue/5ff4a25f2ab79c000178c3db: Tool definition failed validation:
Failed https://cwlviewer.arvados.org/queue/5ff4a5182ab79c000178c437: Tool definition failed validation:
Failed https://cwlviewer.arvados.org/queue/5ff4aa0a2ab79c000178c4ef: Tool definition failed validation:
Failed https://cwlviewer.arvados.org/queue/5ff4b0f42ab79c000178c5be: Tool definition failed validation:
Failed https://cwlviewer.arvados.org/queue/5ff4b3002ab79c000178c603: Tool definition failed validation:
Failed https://cwlviewer.arvados.org/queue/5ff4bb8a2ab79c000178c6ec: Tool definition failed validation:
Failed https://cwlviewer.arvados.org/queue/5ff4bcd82ab79c000178c715: Tool definition failed validation:
Failed https://cwlviewer.arvados.org/queue/5ff4bdb42ab79c000178c73e: Tool definition failed validation:

There are also 116 'Unhandled HTTP status code' errors:

$ cat screen.log.first.complete.run |grep ^Unhandled|wc
    116    1856   12992

$ cat screen.log.first.complete.run |grep ^Unhandled |head
Unhandled HTTP status code: 400  {"message":"Error: Workflow could not be created from the provided cwl file"}
Unhandled HTTP status code: 400  {"message":"Error: Workflow could not be created from the provided cwl file"}
Unhandled HTTP status code: 400  {"message":"Error: Workflow could not be created from the provided cwl file"}
Unhandled HTTP status code: 400  {"message":"Error: Workflow could not be created from the provided cwl file"}
Unhandled HTTP status code: 400  {"message":"Error: Workflow could not be created from the provided cwl file"}
Unhandled HTTP status code: 400  {"message":"Error: Workflow could not be created from the provided cwl file"}
Unhandled HTTP status code: 400  {"message":"Error: Workflow could not be created from the provided cwl file"}
Unhandled HTTP status code: 400  {"message":"Error: Workflow could not be created from the provided cwl file"}
Unhandled HTTP status code: 400  {"message":"Error: Workflow could not be created from the provided cwl file"}
Unhandled HTTP status code: 400  {"message":"Error: Workflow could not be created from the provided cwl file"}

There are 2000 (exactly! that is suspicious) workflows posted:

$ cat screen.log.first.complete.run |grep ^Posted |wc
   2000   14180  289106

$ cat screen.log.first.complete.run |grep ^Posted |head
Posted: {'url': 'https://github.com/dockstore-testing/md5sum-checker.git', 'branch': 'master', 'path': 'checker-workflow-wrapping-tool.cwl'}
Posted: {'url': 'https://github.com/alexbarrera/GGR-cwl.git', 'branch': '6e008c1170ef818b6c4c63f0eec7baa4f7be7b3c', 'path': 'v1.0/STARR-seq_pipeline/04-quantification.cwl'}
Posted: {'url': 'https://github.com/common-workflow-language/common-workflow-language.git', 'branch': 'a062055fddcc7d7d9dbc53d28288e3ccb9a800d8', 'path': 'v1.0/v1.0/dynresreq-workflow-stepdefault.cwl'}
Posted: {'url': 'https://github.com/bcbio/bcbio_validation_workflows.git', 'branch': 'master', 'path': 'somatic-giab-mix/somatic-giab-mix-workflow/wf-variantcall.cwl'}
Posted: {'url': 'https://github.com/Duke-GCB/bespin-cwl.git', 'branch': 'qiime2-workflow-paired', 'path': 'packed/qiime2-step2-deblur.cwl', 'packedId': 'main'}
Posted: {'url': 'https://github.com/smc-rna-challenge/zhanghj-9609644.git', 'branch': 'master', 'path': 'main.cwl'}
Posted: {'url': 'https://github.com/bespin-workflows/exomeseq-gatk4.git', 'branch': 'develop', 'path': 'subworkflows/exomeseq-gatk4-00-prepare-reference-data.cwl'}
Posted: {'url': 'https://github.com/YeoLab/eclip.git', 'branch': '7196b92e262fe5f8acee04cb0d1b6fd23e4febdc', 'path': 'cwl/wf_demultiplex_pe.cwl'}
Posted: {'url': 'https://github.com/YeoLab/eclip.git', 'branch': 'e2a314ff7646c4ea7b90f65caad0452ef6874757', 'path': 'cwl/wf_demultiplex_pe.cwl'}
Posted: {'url': 'https://github.com/mr-c/cwltests.git', 'branch': 'pack_test', 'path': 'cwl/packed.cwl', 'packedId': 'workflow_data.cwl'}

2000-116-27 == 1857

That's close to 1845. Where are those last 12?

Perhaps more importantly, what is limiting the import to 2000 entries?

$ cat 2021-01-05T203420+0000.json |jq -r .content |jq length
2000

Looks like perhaps there were not even exported, sigh.

Okay, the dump.sh script specifies a `size=100000000` parameter, but it seems that the server ignores this and always limits the request to at most `size=2000`. Okay. So we've done only 1 of 12 pages so far. I modified the dump script to download all 12:

/var/backups/cwl/2021-01-05T215304+0000.json
/var/backups/cwl/2021-01-05T215309+0000.json
/var/backups/cwl/2021-01-05T215313+0000.json
/var/backups/cwl/2021-01-05T215317+0000.json
/var/backups/cwl/2021-01-05T215321+0000.json
/var/backups/cwl/2021-01-05T215325+0000.json
/var/backups/cwl/2021-01-05T215329+0000.json
/var/backups/cwl/2021-01-05T215333+0000.json
/var/backups/cwl/2021-01-05T215338+0000.json
/var/backups/cwl/2021-01-05T215341+0000.json
/var/backups/cwl/2021-01-05T215345+0000.json
/var/backups/cwl/2021-01-05T215350+0000.json
/var/backups/cwl/2021-01-05T215352+0000.json

And I'll start importing them all. That's going to take a while.

#17 Updated by Ward Vandewege 9 months ago

As of 9:45am this morning, cwlviewer.arvados.org is at 4003 imported workflows. This is going to take a while longer...

#18 Updated by Peter Amstutz 9 months ago

  • Target version changed from 2021-01-06 Sprint to 2021-01-20 Sprint

#19 Updated by Ward Vandewege 9 months ago

It filled up the 1TB partition somewhere partway through. Joy.

In addition, I'm getting

OperationFailed: Sort operation used more than the maximum 33554432 bytes of RAM. Add an index, or specify a smaller limit.

when trying to jump to the last entry (the screen shows: Whoops - Something Went Wrong!
Error: An internal server error occurred).

I fixed the latter with:

> use admin
switched to db admin
> db.runCommand( { getParameter : 1, "internalQueryExecMaxBlockingSortBytes" : 1 } )
{ "internalQueryExecMaxBlockingSortBytes" : 33554432, "ok" : 1 }
> db.adminCommand({setParameter: 1, internalQueryExecMaxBlockingSortBytes: 335544320})
{ "was" : 33554432, "ok" : 1 }
> db.runCommand( { getParameter : 1, "internalQueryExecMaxBlockingSortBytes" : 1 } )
{ "internalQueryExecMaxBlockingSortBytes" : 335544320, "ok" : 1 }

Now I can see we have 4605 workflows imported, so I'll restart the import from page 3 (4000 and up).

Turns out that that setting doesn't survive a reboot of mongodb, so I'm setting it from the docker compose file now.

#20 Updated by Ward Vandewege 9 months ago

I resized the partition to 10 TiB (!). And I restarted the import, from the dump file that has

  "totalElements": 22535,
  "totalPages": 12

The import completed and has 21068 workflows. The rest failed.

I finally found https://github.com/common-workflow-language/cwlviewer/issues/279 which explains the issue with the disk space leakage, and suggests 2 commands to keep it in check. I'll just run those in a cron, every half hour or so:

  docker-compose exec -T spring find /tmp -type f -mtime +1 -delete
  docker-compose exec -T spring find /tmp -type d -mtime +30 -delete

OK, I've installed that cron job and removed the 10 TB partition. I ended up using a 150 GB data partition, which is about 33% full at the moment.

I think we're ready to cut over the DNS, then get the ssl cert, and this is done.

#21 Updated by Ward Vandewege 9 months ago

Old DNS:

view.commonwl.org. 10757 IN CNAME heater.cs.man.ac.uk.

New DNS:

view.commonwl.org CNAME cwlviewer.arvados.org

#22 Updated by Ward Vandewege 9 months ago

  • Status changed from In Progress to Resolved

The DNS change has been made.

#23 Updated by Ward Vandewege 9 months ago

  • Description updated (diff)

#24 Updated by Ward Vandewege 7 months ago

Also available in: Atom PDF