Support #16926
closedCWL viewer
Added by Peter Amstutz about 4 years ago. Updated almost 3 years ago.
Description
- Set up account owned by SFC
- Schedule time with Stian
- Set up VM
- Migrate DNS
- Set up actual service
- Migrate database
https://github.com/common-workflow-language/cwlviewer
Meeting notes: https://docs.google.com/document/d/1jnUJM5z-we_CNMogUkcJesMEhoFkUrf4tqs4FK8tqrI/edit#heading=h.f2av3ipqwwrp
Updated by Peter Amstutz about 4 years ago
- Related to Idea #16011: CWL support, docs, training, website added
Updated by Peter Amstutz about 4 years ago
- Target version set to 2020-10-07 Sprint
Updated by Peter Amstutz about 4 years ago
- Target version changed from 2020-10-07 Sprint to 2020-10-21 Sprint
Updated by Peter Amstutz about 4 years ago
- Target version changed from 2020-10-21 Sprint to 2020-11-04 Sprint
Updated by Peter Amstutz about 4 years ago
- Target version changed from 2020-11-04 Sprint to 2020-11-18
Updated by Peter Amstutz about 4 years ago
- Target version changed from 2020-11-18 to 2020-12-02 Sprint
Updated by Peter Amstutz about 4 years ago
- Target version changed from 2020-12-02 Sprint to 2020-12-16 Sprint
Updated by Ward Vandewege about 4 years ago
- Status changed from New to In Progress
I created a new AWS account in our organization. At any time we can detach the account from our org, should that become necessary in the future.
Updated by Ward Vandewege about 4 years ago
I've rolled out the VM with terraform and used Salt to configure it.
The hostname is https://cwlviewer.arvados.org
Next step: get the data from view.commonwl.org imported.
Then: get DNS record updated, and then get an SSL cert for view.commonwl.org via Salt.
Updated by Peter Amstutz about 4 years ago
- Target version changed from 2020-12-16 Sprint to 2021-01-06 Sprint
Updated by Ward Vandewege almost 4 years ago
The import command:
ward@cwlviewer:/usr/src/cwlviewer$ time ./load.py /var/backups/cwl/2020-12-15T161841+0000.json.gz https://cwlviewer.arvados.org/
I had to restart it a few times; there is a diskspace leak that causes the disk to fill up during import (a file descriptor that is not released properly?). Restarting docker clears up the excessive space usage.
It seems that the importer doesn't resume cleanly; I have 6 queue entries that are stuck, which effectively halted the import:
The restore processed 1152 workflows, but there are a bunch more on view.commonwl.org. It is currently stuck on 6 workflows in the queue that are supposedly "running". I suspect these are the ones that were in process when the disk filled up (due to that file descriptor/deletion bug; restarting composer cleared up the disk space), since the queue IDs are at least somewhat sequential:
Still running https://cwlviewer.arvados.org/queue/5fd8edda08813b0001ca6bc3
Still running https://cwlviewer.arvados.org/queue/5fd94ac82ab79c00011833fc
Still running https://cwlviewer.arvados.org/queue/5fd94b342ab79c0001183400
Still running https://cwlviewer.arvados.org/queue/5fd94b342ab79c0001183401
Still running https://cwlviewer.arvados.org/queue/5fd94b352ab79c0001183402
Still running https://cwlviewer.arvados.org/queue/5fd94b352ab79c0001183403
Those correspond to these URLs:
https://github.com/mskcc/roslin-variant.git
https://github.com/genome/analysis-workflows.git
https://github.com/genome/analysis-workflows.git
https://github.com/ncbi/pgap.git
https://github.com/Duke-GCB/bespin-cwl.git
https://github.com/rosafilgueira/cyclon_usecase.git
which don't seem out of the ordinary.
Is there a way to cancel a queue entry? I didn't see a mention of that on the api page.
Eventually, it gave up:
Trimmed queue from 6 to 6 Still running https://cwlviewer.arvados.org/queue/5fd8edda08813b0001ca6bc3 Still running https://cwlviewer.arvados.org/queue/5fd94ac82ab79c00011833fc Still running https://cwlviewer.arvados.org/queue/5fd94b342ab79c0001183400 Still running https://cwlviewer.arvados.org/queue/5fd94b342ab79c0001183401 Still running https://cwlviewer.arvados.org/queue/5fd94b352ab79c0001183402 Still running https://cwlviewer.arvados.org/queue/5fd94b352ab79c0001183403 Traceback (most recent call last): File "/usr/lib/python3/dist-packages/urllib3/connection.py", line 159, in _new_conn (self._dns_host, self.port), self.timeout, **extra_kw) File "/usr/lib/python3/dist-packages/urllib3/util/connection.py", line 80, in create_connection raise err File "/usr/lib/python3/dist-packages/urllib3/util/connection.py", line 70, in create_connection sock.connect(sa) TimeoutError: [Errno 110] Connection timed out During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 600, in urlopen chunked=chunked) File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 343, in _make_request self._validate_conn(conn) File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 841, in _validate_conn conn.connect() File "/usr/lib/python3/dist-packages/urllib3/connection.py", line 301, in connect conn = self._new_conn() File "/usr/lib/python3/dist-packages/urllib3/connection.py", line 168, in _new_conn self, "Failed to establish a new connection: %s" % e) urllib3.exceptions.NewConnectionError: <urllib3.connection.VerifiedHTTPSConnection object at 0x7f5d81bea978>: Failed to establish a new connection: [Errno 110] Connection timed out During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/lib/python3/dist-packages/requests/adapters.py", line 449, in send timeout=timeout File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 638, in urlopen _stacktrace=sys.exc_info()[2]) File "/usr/lib/python3/dist-packages/urllib3/util/retry.py", line 398, in increment raise MaxRetryError(_pool, url, error or ResponseError(cause)) urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='cwlviewer.arvados.org', port=443): Max retries exceeded with url: /queue/5fd94b352ab79c0001183402 (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f5d81bea978>: Failed to establish a new connection: [Errno 110] Connection timed out')) During handling of the above exception, another exception occurred: Traceback (most recent call last): File "./load.py", line 117, in <module> main(*sys.argv[1:]) File "./load.py", line 107, in main queued = trim_queue(queued) File "./load.py", line 82, in trim_queue if is_running(q): File "./load.py", line 63, in is_running queued = requests.get(location, allow_redirects=False, headers=HEADERS) File "/usr/lib/python3/dist-packages/requests/api.py", line 75, in get return request('get', url, params=params, **kwargs) File "/usr/lib/python3/dist-packages/requests/api.py", line 60, in request return session.request(method=method, url=url, **kwargs) File "/usr/lib/python3/dist-packages/requests/sessions.py", line 533, in request resp = self.send(prep, **send_kwargs) File "/usr/lib/python3/dist-packages/requests/sessions.py", line 646, in send r = adapter.send(request, **kwargs) File "/usr/lib/python3/dist-packages/requests/adapters.py", line 516, in send raise ConnectionError(e, request=request) requests.exceptions.ConnectionError: HTTPSConnectionPool(host='cwlviewer.arvados.org', port=443): Max retries exceeded with url: /queue/5fd94b352ab79c0001183402 (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f5d81bea978>: Failed to establish a new connection: [Errno 110] Connection timed out')) real 7855m45.990s user 977m47.651s sys 36m38.701s
Updated by Ward Vandewege almost 4 years ago
I connected to the mongo db:
> show dbs admin 0.000GB local 0.000GB test 0.006GB > use test switched to db test > show collections; queuedWorkflow workflow > db.queuedWorkflow.find( { cwltoolStatus: "RUNNING" } ).count() 6 > db.queuedWorkflow.find( { cwltoolStatus: "RUNNING" } ) { "_id" : ObjectId("5fd8edda08813b0001ca6bc3"), "_class" : "org.commonwl.view.workflow.QueuedWorkflow", "tempRepresentation" : { "_id" : null, "retrievedFrom" : { "repoUrl" : "https://github.com/mskcc/roslin-variant.git", "branch" : "2.4.x", "path" : "setup/cwl/module-5.cwl" }, "retrievedOn" : ISODate("2020-12-15T17:09:46.260Z"), "lastCommit" : "425970cb18efb69b42edf26f92c07b7683a62576", "label" : "module-5", "inputs" : { "md_metrics_files" : { "type" : "[]", "sourceID" : [ ] }, "fp_intervals" : { "type" : "File", "sourceID" : [ ] }, "bams" : { "type" : "File[]", "sourceID" : [ ] }, "request_file" : { "type" : "File", "sourceID" : [ ] }, "target_intervals" : { "type" : "File", "sourceID" : [ ] }, "fp_genotypes" : { "type" : "File", "sourceID" : [ ] }, "grouping_file" : { "type" : "File", "sourceID" : [ ] }, "project_prefix" : { "type" : "string", "sourceID" : [ ] }, "genome" : { "type" : "string", "sourceID" : [ ] }, "bait_intervals" : { "type" : "File", "sourceID" : [ ] }, "clstats1" : { "type" : "[]", "sourceID" : [ ] }, "pairing_file" : { "type" : "File", "sourceID" : [ ] }, "clstats2" : { "type" : "[]", "sourceID" : [ ] } }, "outputs" : { "qual_pdf" : { "type" : "File[]", "sourceID" : [ "scatter_metrics" ] }, "per_target_coverage" : { "type" : "File[]", "sourceID" : [ "scatter_metrics" ] }, "doc_basecounts" : { "type" : "File[]", "sourceID" : [ "scatter_metrics" ] }, "gcbias_summary" : { "type" : "File[]", "sourceID" : [ "scatter_metrics" ] }, "qc_files" : { "type" : "File[]", "sourceID" : [ "generate_pdf" ] }, "hs_metrics" : { "type" : "File[]", "sourceID" : [ "scatter_metrics" ] }, "gcbias_pdf" : { "type" : "File[]", "sourceID" : [ "scatter_metrics" ] }, "insert_metrics" : { "type" : "File[]", "sourceID" : [ "scatter_metrics" ] }, "insert_pdf" : { "type" : "File[]", "sourceID" : [ "scatter_metrics" ] }, "qual_metrics" : { "type" : "File[]", "sourceID" : [ "scatter_metrics" ] }, "gcbias_metrics" : { "type" : "File[]", "sourceID" : [ "scatter_metrics" ] }, "as_metrics" : { "type" : "File[]", "sourceID" : [ "scatter_metrics" ] } }, "steps" : { "scatter_metrics" : { "run" : "", "sources" : { "genome" : { "sourceID" : [ "genome" ] }, "bait_intervals" : { "sourceID" : [ "bait_intervals" ] }, "fp_intervals" : { "sourceID" : [ "fp_intervals" ] }, "target_intervals" : { "sourceID" : [ "target_intervals" ] }, "bam" : { "sourceID" : [ "bams" ] } } }, "generate_pdf" : { "run" : "cmo-qcpdf/0.5.11/cmo-qcpdf.cwl", "sources" : { "qualmetrics_files" : { "sourceID" : [ ] }, "md_metrics_files" : { "sourceID" : [ "md_metrics_files" ] }, "mdmetrics_files" : { "sourceID" : [ ] }, "hsmetrics_files" : { "sourceID" : [ ] }, "request_file" : { "sourceID" : [ "request_file" ] }, "fp_genotypes" : { "sourceID" : [ "fp_genotypes" ] }, "grouping_file" : { "sourceID" : [ "grouping_file" ] }, "file_prefix" : { "sourceID" : [ "project_prefix" ] }, "gcbias_files" : { "sourceID" : [ ] }, "trimgalore_files" : { "sourceID" : [ ] }, "clstats1" : { "sourceID" : [ "clstats1" ] }, "insertsize_files" : { "sourceID" : [ ] }, "files" : { "sourceID" : [ "scatter_metrics" ] }, "fingerprint_files" : { "sourceID" : [ ] }, "pairing_file" : { "sourceID" : [ "pairing_file" ] }, "clstats2" : { "sourceID" : [ "clstats2" ] } } } }, "cwltoolVersion" : "1.0.20180525185854", "visualisationDot" : "digraph workflow {\n graph [\n bgcolor = \"#eeeeee\"\n color = \"black\"\n fontsize = \"10\"\n labeljust = \"left\"\n clusterrank = \"local\"\n ranksep = \"0.22\"\n nodesep = \"0.05\"\n ]\n node [\n fontname = \"Helvetica\"\n fontsize = \"10\"\n fontcolor = \"black\"\n shape = \"record\"\n height = \"0\"\n width = \"0\"\n color = \"black\"\n fillcolor = \"lightgoldenrodyellow\"\n style = \"filled\"\n ];\n edge [\n fontname=\"Helvetica\"\n fontsize=\"8\"\n fontcolor=\"black\"\n color=\"black\"\n arrowsize=\"0.7\"\n ];\n subgraph cluster_inputs {\n rank = \"same\";\n style = \"dashed\";\n label = \"Workflow Inputs\";\n \"md_metrics_files\" [fillcolor=\"#94DDF4\"];\n \"fp_intervals\" [fillcolor=\"#94DDF4\"];\n \"bams\" [fillcolor=\"#94DDF4\"];\n \"request_file\" [fillcolor=\"#94DDF4\"];\n \"target_intervals\" [fillcolor=\"#94DDF4\"];\n \"fp_genotypes\" [fillcolor=\"#94DDF4\"];\n \"grouping_file\" [fillcolor=\"#94DDF4\"];\n \"project_prefix\" [fillcolor=\"#94DDF4\"];\n \"genome\" [fillcolor=\"#94DDF4\"];\n \"bait_intervals\" [fillcolor=\"#94DDF4\"];\n \"clstats1\" [fillcolor=\"#94DDF4\"];\n \"pairing_file\" [fillcolor=\"#94DDF4\"];\n \"clstats2\" [fillcolor=\"#94DDF4\"];\n }\n subgraph cluster_outputs {\n rank = \"same\";\n style = \"dashed\";\n label = \"Workflow Outputs\";\n \"qual_pdf\" [fillcolor=\"#94DDF4\"];\n \"per_target_coverage\" [fillcolor=\"#94DDF4\"];\n \"doc_basecounts\" [fillcolor=\"#94DDF4\"];\n \"gcbias_summary\" [fillcolor=\"#94DDF4\"];\n \"qc_files\" [fillcolor=\"#94DDF4\"];\n \"hs_metrics\" [fillcolor=\"#94DDF4\"];\n \"gcbias_pdf\" [fillcolor=\"#94DDF4\"];\n \"insert_metrics\" [fillcolor=\"#94DDF4\"];\n \"insert_pdf\" [fillcolor=\"#94DDF4\"];\n \"qual_metrics\" [fillcolor=\"#94DDF4\"];\n \"gcbias_metrics\" [fillcolor=\"#94DDF4\"];\n \"as_metrics\" [fillcolor=\"#94DDF4\"];\n }\n \"scatter_metrics\";\n \"generate_pdf\";\n \"scatter_metrics\" -> \"qual_pdf\";\n \"scatter_metrics\" -> \"per_target_coverage\";\n \"scatter_metrics\" -> \"doc_basecounts\";\n \"scatter_metrics\" -> \"gcbias_summary\";\n \"generate_pdf\" -> \"qc_files\";\n \"scatter_metrics\" -> \"hs_metrics\";\n \"scatter_metrics\" -> \"gcbias_pdf\";\n \"scatter_metrics\" -> \"insert_metrics\";\n \"scatter_metrics\" -> \"insert_pdf\";\n \"scatter_metrics\" -> \"qual_metrics\";\n \"scatter_metrics\" -> \"gcbias_metrics\";\n \"scatter_metrics\" -> \"as_metrics\";\n \"genome\" -> \"scatter_metrics\";\n \"bait_intervals\" -> \"scatter_metrics\";\n \"fp_intervals\" -> \"scatter_metrics\";\n \"target_intervals\" -> \"scatter_metrics\";\n \"bams\" -> \"scatter_metrics\";\n \"md_metrics_files\" -> \"generate_pdf\";\n \"request_file\" -> \"generate_pdf\";\n \"fp_genotypes\" -> \"generate_pdf\";\n \"grouping_file\" -> \"generate_pdf\";\n \"project_prefix\" -> \"generate_pdf\";\n \"clstats1\" -> \"generate_pdf\";\n \"scatter_metrics\" -> \"generate_pdf\";\n \"pairing_file\" -> \"generate_pdf\";\n \"clstats2\" -> \"generate_pdf\";\n\n // Invisible links to force outputs to be at lowest rank\n \"scatter_metrics\" -> \"qual_pdf\" [style=invis];\n \"generate_pdf\" -> \"qual_pdf\" [style=invis];\n}\n", "permaLinkBase" : "https://w3id.org/cwl/view" }, "cwltoolStatus" : "RUNNING", "cwltoolVersion" : "" } { "_id" : ObjectId("5fd94ac82ab79c00011833fc"), "_class" : "org.commonwl.view.workflow.QueuedWorkflow", "tempRepresentation" : { "_id" : null, "retrievedFrom" : { "repoUrl" : "https://github.com/genome/analysis-workflows.git", "branch" : "509938802c5e42bb8084c6a5a26ab6425c60e69a", "path" : "definitions/subworkflows/bam_to_trimmed_fastq.cwl" }, "retrievedOn" : ISODate("2020-12-15T23:46:16.018Z"), "lastCommit" : "509938802c5e42bb8084c6a5a26ab6425c60e69a", "label" : "bam to trimmed fastqs", "inputs" : { "adapters" : { "type" : "File", "sourceID" : [ ] }, "min_readlength" : { "type" : "int", "sourceID" : [ ] }, "adapter_trim_end" : { "type" : "string", "sourceID" : [ ] }, "adapter_min_overlap" : { "type" : "int", "sourceID" : [ ] }, "max_uncalled" : { "type" : "int", "sourceID" : [ ] }, "bam" : { "type" : "File", "sourceID" : [ ] } }, "outputs" : { "fastqs" : { "type" : "File[]", "sourceID" : [ "trim_fastq" ] }, "fastq2" : { "type" : "File", "sourceID" : [ "trim_fastq" ] }, "fastq1" : { "type" : "File", "sourceID" : [ "trim_fastq" ] } }, "steps" : { "trim_fastq" : { "run" : "../tools/trim_fastq.cwl", "sources" : { "reads2" : { "sourceID" : [ "bam_to_fastq" ] }, "reads1" : { "sourceID" : [ "bam_to_fastq" ] }, "adapters" : { "sourceID" : [ "adapters" ] }, "min_readlength" : { "sourceID" : [ "min_readlength" ] }, "adapter_trim_end" : { "sourceID" : [ "adapter_trim_end" ] }, "adapter_min_overlap" : { "sourceID" : [ "adapter_min_overlap" ] }, "max_uncalled" : { "sourceID" : [ "max_uncalled" ] } } }, "bam_to_fastq" : { "run" : "../tools/bam_to_fastq.cwl", "sources" : { "bam" : { "sourceID" : [ "bam" ] } } } }, "cwltoolVersion" : "1.0.20180525185854", "visualisationDot" : "digraph workflow {\n graph [\n bgcolor = \"#eeeeee\"\n color = \"black\"\n fontsize = \"10\"\n labeljust = \"left\"\n clusterrank = \"local\"\n ranksep = \"0.22\"\n nodesep = \"0.05\"\n ]\n node [\n fontname = \"Helvetica\"\n fontsize = \"10\"\n fontcolor = \"black\"\n shape = \"record\"\n height = \"0\"\n width = \"0\"\n color = \"black\"\n fillcolor = \"lightgoldenrodyellow\"\n style = \"filled\"\n ];\n edge [\n fontname=\"Helvetica\"\n fontsize=\"8\"\n fontcolor=\"black\"\n color=\"black\"\n arrowsize=\"0.7\"\n ];\n subgraph cluster_inputs {\n rank = \"same\";\n style = \"dashed\";\n label = \"Workflow Inputs\";\n \"adapters\" [fillcolor=\"#94DDF4\"];\n \"min_readlength\" [fillcolor=\"#94DDF4\"];\n \"adapter_trim_end\" [fillcolor=\"#94DDF4\"];\n \"adapter_min_overlap\" [fillcolor=\"#94DDF4\"];\n \"max_uncalled\" [fillcolor=\"#94DDF4\"];\n \"bam\" [fillcolor=\"#94DDF4\"];\n }\n subgraph cluster_outputs {\n rank = \"same\";\n style = \"dashed\";\n label = \"Workflow Outputs\";\n \"fastqs\" [fillcolor=\"#94DDF4\"];\n \"fastq2\" [fillcolor=\"#94DDF4\"];\n \"fastq1\" [fillcolor=\"#94DDF4\"];\n }\n \"trim_fastq\";\n \"bam_to_fastq\";\n \"trim_fastq\" -> \"fastqs\";\n \"trim_fastq\" -> \"fastq2\";\n \"trim_fastq\" -> \"fastq1\";\n \"bam_to_fastq\" -> \"trim_fastq\";\n \"bam_to_fastq\" -> \"trim_fastq\";\n \"adapters\" -> \"trim_fastq\";\n \"min_readlength\" -> \"trim_fastq\";\n \"adapter_trim_end\" -> \"trim_fastq\";\n \"adapter_min_overlap\" -> \"trim_fastq\";\n \"max_uncalled\" -> \"trim_fastq\";\n \"bam\" -> \"bam_to_fastq\";\n\n // Invisible links to force outputs to be at lowest rank\n \"trim_fastq\" -> \"fastqs\" [style=invis];\n \"bam_to_fastq\" -> \"fastqs\" [style=invis];\n}\n", "permaLinkBase" : "https://w3id.org/cwl/view" }, "cwltoolStatus" : "RUNNING", "cwltoolVersion" : "" } { "_id" : ObjectId("5fd94b342ab79c0001183400"), "_class" : "org.commonwl.view.workflow.QueuedWorkflow", "tempRepresentation" : { "_id" : null, "retrievedFrom" : { "repoUrl" : "https://github.com/genome/analysis-workflows.git", "branch" : "b465f0da2806ddb6df481409541d13288ccb40ec", "path" : "definitions/subworkflows/sequence_to_bqsr.cwl" }, "retrievedOn" : ISODate("2020-12-15T23:48:04.255Z"), "lastCommit" : "b465f0da2806ddb6df481409541d13288ccb40ec", "label" : "Raw sequence data to BQSR", "inputs" : { "reference" : { "type" : "string, File", "sourceID" : [ ] }, "bqsr_intervals" : { "type" : "string[]?", "sourceID" : [ ] }, "trimming" : { "type" : "../types/trimming_options.yml#trimming_options?", "sourceID" : [ ] }, "bqsr_known_sites" : { "doc" : "One or more databases of known polymorphic sites used to exclude regions around known polymorphisms from analysis.", "type" : "File[]", "sourceID" : [ ] }, "unaligned" : { "type" : "../types/sequence_data.yml#sequence_data[]", "sourceID" : [ ] }, "final_name" : { "type" : "string", "sourceID" : [ ], "defaultVal" : "\\\"final\\\"" } }, "outputs" : { "final_bam" : { "type" : "File", "sourceID" : [ "index_bam" ] }, "mark_duplicates_metrics_file" : { "type" : "File", "sourceID" : [ "mark_duplicates_and_sort" ] } }, "steps" : { "bqsr" : { "run" : "../tools/bqsr.cwl", "sources" : { "reference" : { "sourceID" : [ "reference" ] }, "intervals" : { "sourceID" : [ "bqsr_intervals" ] }, "known_sites" : { "sourceID" : [ "bqsr_known_sites" ] }, "bam" : { "sourceID" : [ "mark_duplicates_and_sort" ] } } }, "index_bam" : { "run" : "../tools/index_bam.cwl", "sources" : { "bam" : { "sourceID" : [ "apply_bqsr" ] } } }, "apply_bqsr" : { "run" : "../tools/apply_bqsr.cwl", "sources" : { "reference" : { "sourceID" : [ "reference" ] }, "output_name" : { "sourceID" : [ "final_name" ] }, "bqsr_table" : { "sourceID" : [ "bqsr" ] }, "bam" : { "sourceID" : [ "mark_duplicates_and_sort" ] } } }, "mark_duplicates_and_sort" : { "run" : "../tools/mark_duplicates_and_sort.cwl", "sources" : { "bam" : { "sourceID" : [ "name_sort" ] } } }, "merge" : { "run" : "../tools/merge_bams_samtools.cwl", "sources" : { "name" : { "sourceID" : [ "final_name" ] }, "bams" : { "sourceID" : [ "align" ] } } }, "name_sort" : { "run" : "../tools/name_sort.cwl", "sources" : { "bam" : { "sourceID" : [ "merge" ] } } }, "align" : { "run" : "sequence_align_and_tag_adapter.cwl", "sources" : { "reference" : { "sourceID" : [ "reference" ] }, "trimming" : { "sourceID" : [ "trimming" ] }, "unaligned" : { "sourceID" : [ "unaligned" ] } } } }, "cwltoolVersion" : "1.0.20180525185854", "visualisationDot" : "digraph workflow {\n graph [\n bgcolor = \"#eeeeee\"\n color = \"black\"\n fontsize = \"10\"\n labeljust = \"left\"\n clusterrank = \"local\"\n ranksep = \"0.22\"\n nodesep = \"0.05\"\n ]\n node [\n fontname = \"Helvetica\"\n fontsize = \"10\"\n fontcolor = \"black\"\n shape = \"record\"\n height = \"0\"\n width = \"0\"\n color = \"black\"\n fillcolor = \"lightgoldenrodyellow\"\n style = \"filled\"\n ];\n edge [\n fontname=\"Helvetica\"\n fontsize=\"8\"\n fontcolor=\"black\"\n color=\"black\"\n arrowsize=\"0.7\"\n ];\n subgraph cluster_inputs {\n rank = \"same\";\n style = \"dashed\";\n label = \"Workflow Inputs\";\n \"reference\" [fillcolor=\"#94DDF4\"];\n \"bqsr_intervals\" [fillcolor=\"#94DDF4\"];\n \"trimming\" [fillcolor=\"#94DDF4\"];\n \"bqsr_known_sites\" [fillcolor=\"#94DDF4\"];\n \"unaligned\" [fillcolor=\"#94DDF4\"];\n \"final_name\" [fillcolor=\"#94DDF4\"];\n }\n subgraph cluster_outputs {\n rank = \"same\";\n style = \"dashed\";\n label = \"Workflow Outputs\";\n \"final_bam\" [fillcolor=\"#94DDF4\"];\n \"mark_duplicates_metrics_file\" [fillcolor=\"#94DDF4\"];\n }\n \"bqsr\";\n \"index_bam\";\n \"apply_bqsr\";\n \"mark_duplicates_and_sort\";\n \"merge\";\n \"name_sort\";\n \"align\";\n \"index_bam\" -> \"final_bam\";\n \"mark_duplicates_and_sort\" -> \"mark_duplicates_metrics_file\";\n \"reference\" -> \"bqsr\";\n \"bqsr_intervals\" -> \"bqsr\";\n \"bqsr_known_sites\" -> \"bqsr\";\n \"mark_duplicates_and_sort\" -> \"bqsr\";\n \"apply_bqsr\" -> \"index_bam\";\n \"reference\" -> \"apply_bqsr\";\n \"final_name\" -> \"apply_bqsr\";\n \"bqsr\" -> \"apply_bqsr\";\n \"mark_duplicates_and_sort\" -> \"apply_bqsr\";\n \"name_sort\" -> \"mark_duplicates_and_sort\";\n \"final_name\" -> \"merge\";\n \"align\" -> \"merge\";\n \"merge\" -> \"name_sort\";\n \"reference\" -> \"align\";\n \"trimming\" -> \"align\";\n \"unaligned\" -> \"align\";\n\n // Invisible links to force outputs to be at lowest rank\n \"bqsr\" -> \"final_bam\" [style=invis];\n \"index_bam\" -> \"final_bam\" [style=invis];\n \"apply_bqsr\" -> \"final_bam\" [style=invis];\n \"mark_duplicates_and_sort\" -> \"final_bam\" [style=invis];\n \"merge\" -> \"final_bam\" [style=invis];\n \"name_sort\" -> \"final_bam\" [style=invis];\n \"align\" -> \"final_bam\" [style=invis];\n}\n", "permaLinkBase" : "https://w3id.org/cwl/view" }, "cwltoolStatus" : "RUNNING", "cwltoolVersion" : "" } { "_id" : ObjectId("5fd94b342ab79c0001183401"), "_class" : "org.commonwl.view.workflow.QueuedWorkflow", "tempRepresentation" : { "_id" : null, "retrievedFrom" : { "repoUrl" : "https://github.com/ncbi/pgap.git", "branch" : "master", "path" : "bacterial_mobile_elem/wf_bacterial_mobile_elem.cwl" }, "retrievedOn" : ISODate("2020-12-15T23:48:04.835Z"), "lastCommit" : "9ff3e17888a15f4691ba82380472317214e20a1c", "label" : "Execute CRISPR", "inputs" : { "asn_cache" : { "type" : "Directory", "sourceID" : [ ] }, "go" : { "type" : "boolean[]", "sourceID" : [ ] }, "seqids" : { "type" : "File", "sourceID" : [ ] } }, "outputs" : { "annots" : { "type" : "File", "sourceID" : [ "Execute_CRISPR_dump" ] } }, "steps" : { "Execute_CRISPR_wnode" : { "run" : "ncbi_crisper_wnode.cwl", "sources" : { "asn_cache" : { "sourceID" : [ "asn_cache" ] }, "input_jobs" : { "sourceID" : [ "Execute_CRISPR_submit" ] } } }, "Execute_CRISPR_submit" : { "run" : "gpx_qsubmit.cwl", "sources" : { "asn_cache" : { "sourceID" : [ "asn_cache" ] }, "seqids" : { "sourceID" : [ "seqids" ] } } }, "Execute_CRISPR_dump" : { "run" : "gpx_qdump.cwl", "sources" : { "input_path" : { "sourceID" : [ "Execute_CRISPR_wnode" ] } } } }, "cwltoolVersion" : "1.0.20180525185854", "visualisationDot" : "digraph workflow {\n graph [\n bgcolor = \"#eeeeee\"\n color = \"black\"\n fontsize = \"10\"\n labeljust = \"left\"\n clusterrank = \"local\"\n ranksep = \"0.22\"\n nodesep = \"0.05\"\n ]\n node [\n fontname = \"Helvetica\"\n fontsize = \"10\"\n fontcolor = \"black\"\n shape = \"record\"\n height = \"0\"\n width = \"0\"\n color = \"black\"\n fillcolor = \"lightgoldenrodyellow\"\n style = \"filled\"\n ];\n edge [\n fontname=\"Helvetica\"\n fontsize=\"8\"\n fontcolor=\"black\"\n color=\"black\"\n arrowsize=\"0.7\"\n ];\n subgraph cluster_inputs {\n rank = \"same\";\n style = \"dashed\";\n label = \"Workflow Inputs\";\n \"asn_cache\" [fillcolor=\"#94DDF4\"];\n \"go\" [fillcolor=\"#94DDF4\"];\n \"seqids\" [fillcolor=\"#94DDF4\"];\n }\n subgraph cluster_outputs {\n rank = \"same\";\n style = \"dashed\";\n label = \"Workflow Outputs\";\n \"annots\" [fillcolor=\"#94DDF4\"];\n }\n \"Execute_CRISPR_wnode\";\n \"Execute_CRISPR_submit\";\n \"Execute_CRISPR_dump\";\n \"Execute_CRISPR_dump\" -> \"annots\";\n \"asn_cache\" -> \"Execute_CRISPR_wnode\";\n \"Execute_CRISPR_submit\" -> \"Execute_CRISPR_wnode\";\n \"asn_cache\" -> \"Execute_CRISPR_submit\";\n \"seqids\" -> \"Execute_CRISPR_submit\";\n \"Execute_CRISPR_wnode\" -> \"Execute_CRISPR_dump\";\n\n // Invisible links to force outputs to be at lowest rank\n \"Execute_CRISPR_wnode\" -> \"annots\" [style=invis];\n \"Execute_CRISPR_submit\" -> \"annots\" [style=invis];\n \"Execute_CRISPR_dump\" -> \"annots\" [style=invis];\n}\n", "permaLinkBase" : "https://w3id.org/cwl/view" }, "cwltoolStatus" : "RUNNING", "cwltoolVersion" : "" } { "_id" : ObjectId("5fd94b352ab79c0001183402"), "_class" : "org.commonwl.view.workflow.QueuedWorkflow", "tempRepresentation" : { "_id" : null, "retrievedFrom" : { "repoUrl" : "https://github.com/Duke-GCB/bespin-cwl.git", "branch" : "master", "path" : "subworkflows/exomeseq-gatk4-01-preprocessing.cwl" }, "retrievedOn" : ISODate("2020-12-15T23:48:05.835Z"), "lastCommit" : "bbe24d8d7fde2e918583b96805909a2867b749d6", "label" : "exomeseq-gatk4-01-preprocessing.cwl", "inputs" : { "intervals" : { "type" : "File[]?", "sourceID" : [ ] }, "interval_padding" : { "type" : "int?", "sourceID" : [ ] }, "read_pair" : { "type" : "../types/FASTQReadPairType.yml#FASTQReadPairType", "sourceID" : [ ] }, "library" : { "type" : "string", "sourceID" : [ ] }, "reference_genome" : { "type" : "File", "sourceID" : [ ] }, "bait_interval_list" : { "type" : "File", "sourceID" : [ ] }, "threads" : { "type" : "int", "sourceID" : [ ] }, "target_interval_list" : { "type" : "File", "sourceID" : [ ] }, "known_sites" : { "type" : "File[]", "sourceID" : [ ] }, "resource_dbsnp" : { "type" : "File", "sourceID" : [ ] }, "platform" : { "type" : "string", "sourceID" : [ ] } }, "outputs" : { "recalibrated_reads" : { "type" : "File", "sourceID" : [ "recalibrate_02_apply_bqsr" ] }, "raw_variants" : { "doc" : "VCF file from per sample variant calling", "type" : "File", "sourceID" : [ "variant_calling" ] }, "trim_reports" : { "type" : "File[]", "sourceID" : [ "trim" ] }, "markduplicates_bam" : { "type" : "File", "sourceID" : [ "mark_duplicates" ] }, "recalibration_table" : { "type" : "File", "sourceID" : [ "recalibrate_01_analyze" ] }, "haplotypes_bam" : { "doc" : "BAM file containing assembled haplotypes and locally realigned reads", "type" : "File", "sourceID" : [ "variant_calling" ] }, "fastqc_reports" : { "type" : "File[]", "sourceID" : [ "qc" ] } }, "steps" : { "qc" : { "run" : "../tools/fastqc.cwl", "sources" : { "input_fastq_file" : { "sourceID" : [ "combine_reads" ] }, "threads" : { "sourceID" : [ ], "defaultVal" : "\\\"4\\\"" } } }, "file_pair_details" : { "run" : "../tools/extract-named-file-pair-details.cwl", "sources" : { "read_pair" : { "sourceID" : [ "read_pair" ] }, "library" : { "sourceID" : [ "library" ] }, "platform" : { "sourceID" : [ "platform" ] } } }, "trim" : { "run" : "../tools/trim_galore.cwl", "sources" : { "reads" : { "sourceID" : [ "combine_reads" ] }, "paired" : { "sourceID" : [ ], "defaultVal" : "\\\"true\\\"" } } }, "recalibrate_02_apply_bqsr" : { "run" : "../tools/GATK4-ApplyBQSR.cwl", "sources" : { "reference" : { "sourceID" : [ "reference_genome" ] }, "intervals" : { "sourceID" : [ "intervals" ] }, "interval_padding" : { "sourceID" : [ "interval_padding" ] }, "bqsr_report" : { "sourceID" : [ "recalibrate_01_analyze" ] }, "output_recalibrated_bam_filename" : { "sourceID" : [ "generate_sample_filenames" ] }, "add_output_sam_program_record" : { "sourceID" : [ ], "defaultVal" : "\\\"true\\\"" }, "input_bam" : { "sourceID" : [ "mark_duplicates" ] }, "java_opt" : { "sourceID" : [ ], "defaultVal" : "\\\"-Xms3g\\\"" } } }, "combine_reads" : { "run" : "../tools/concat-gz-files.cwl", "sources" : { "output_filename" : { "sourceID" : [ "generate_sample_filenames" ] }, "files" : { "sourceID" : [ "file_pair_details" ] } } }, "variant_calling" : { "run" : "../tools/GATK4-HaplotypeCaller.cwl", "sources" : { "reference" : { "sourceID" : [ "reference_genome" ] }, "output_variants_filename" : { "sourceID" : [ "generate_sample_filenames" ] }, "output_bam_filename" : { "sourceID" : [ "generate_sample_filenames" ] }, "intervals" : { "sourceID" : [ "intervals" ] }, "interval_padding" : { "sourceID" : [ "interval_padding" ] }, "annotation_groups" : { "sourceID" : [ ], "defaultVal" : "\\\"\\\"" }, "input_bam" : { "sourceID" : [ "recalibrate_02_apply_bqsr" ] }, "emit_ref_confidence" : { "sourceID" : [ ], "defaultVal" : "\\\"GVCF\\\"" }, "java_opt" : { "sourceID" : [ ], "defaultVal" : "\\\"-Xms7g\\\"" } } }, "generate_sample_filenames" : { "run" : "../tools/generate-sample-filenames.cwl", "sources" : { "sample_name" : { "sourceID" : [ "file_pair_details" ] } } }, "sort" : { "run" : "../tools/GATK4-SortSam.cwl", "sources" : { "input_file" : { "sourceID" : [ "map" ] }, "output_sorted_bam_filename" : { "sourceID" : [ "generate_sample_filenames" ] }, "sort_order" : { "sourceID" : [ ], "defaultVal" : "\\\"coordinate\\\"" }, "java_opt" : { "sourceID" : [ ], "defaultVal" : "\\\"-Xms4g\\\"" } } }, "mark_duplicates" : { "run" : "../tools/GATK4-MarkDuplicates.cwl", "sources" : { "assume_sort_order" : { "sourceID" : [ ], "defaultVal" : "\\\"coordinate\\\"" }, "input_file" : { "sourceID" : [ "sort" ] }, "validation_stringency" : { "sourceID" : [ ], "defaultVal" : "\\\"SILENT\\\"" }, "output_filename" : { "sourceID" : [ "generate_sample_filenames" ] }, "create_index" : { "sourceID" : [ ], "defaultVal" : "\\\"true\\\"" }, "metrics_filename" : { "sourceID" : [ "generate_sample_filenames" ] }, "optical_duplicate_pixel_distance" : { "sourceID" : [ ], "defaultVal" : "\\\"2500\\\"" }, "java_opt" : { "sourceID" : [ ], "defaultVal" : "\\\"-Xms4g\\\"" } } }, "recalibrate_01_analyze" : { "run" : "../tools/GATK4-BaseRecalibrator.cwl", "sources" : { "reference" : { "sourceID" : [ "reference_genome" ] }, "intervals" : { "sourceID" : [ "intervals" ] }, "interval_padding" : { "sourceID" : [ "interval_padding" ] }, "output_recalibration_report_filename" : { "sourceID" : [ "generate_sample_filenames" ] }, "input_bam" : { "sourceID" : [ "mark_duplicates" ] }, "known_sites" : { "sourceID" : [ "known_sites" ] }, "java_opt" : { "sourceID" : [ ], "defaultVal" : "\\\"-Xms4g\\\"" } } }, "map" : { "run" : "../tools/gitc-bwa-mem-samtools.cwl", "sources" : { "reference" : { "sourceID" : [ "reference_genome" ] }, "reads" : { "sourceID" : [ "trim" ] }, "output_filename" : { "sourceID" : [ "generate_sample_filenames" ] }, "read_group_header" : { "sourceID" : [ "file_pair_details" ] }, "threads" : { "sourceID" : [ "threads" ] } } } }, "cwltoolVersion" : "1.0.20180525185854", "visualisationDot" : "digraph workflow {\n graph [\n bgcolor = \"#eeeeee\"\n color = \"black\"\n fontsize = \"10\"\n labeljust = \"left\"\n clusterrank = \"local\"\n ranksep = \"0.22\"\n nodesep = \"0.05\"\n ]\n node [\n fontname = \"Helvetica\"\n fontsize = \"10\"\n fontcolor = \"black\"\n shape = \"record\"\n height = \"0\"\n width = \"0\"\n color = \"black\"\n fillcolor = \"lightgoldenrodyellow\"\n style = \"filled\"\n ];\n edge [\n fontname=\"Helvetica\"\n fontsize=\"8\"\n fontcolor=\"black\"\n color=\"black\"\n arrowsize=\"0.7\"\n ];\n subgraph cluster_inputs {\n rank = \"same\";\n style = \"dashed\";\n label = \"Workflow Inputs\";\n \"intervals\" [fillcolor=\"#94DDF4\"];\n \"interval_padding\" [fillcolor=\"#94DDF4\"];\n \"read_pair\" [fillcolor=\"#94DDF4\"];\n \"library\" [fillcolor=\"#94DDF4\"];\n \"reference_genome\" [fillcolor=\"#94DDF4\"];\n \"bait_interval_list\" [fillcolor=\"#94DDF4\"];\n \"threads\" [fillcolor=\"#94DDF4\"];\n \"target_interval_list\" [fillcolor=\"#94DDF4\"];\n \"known_sites\" [fillcolor=\"#94DDF4\"];\n \"resource_dbsnp\" [fillcolor=\"#94DDF4\"];\n \"platform\" [fillcolor=\"#94DDF4\"];\n }\n subgraph cluster_outputs {\n rank = \"same\";\n style = \"dashed\";\n label = \"Workflow Outputs\";\n \"recalibrated_reads\" [fillcolor=\"#94DDF4\"];\n \"raw_variants\" [fillcolor=\"#94DDF4\"];\n \"trim_reports\" [fillcolor=\"#94DDF4\"];\n \"markduplicates_bam\" [fillcolor=\"#94DDF4\"];\n \"recalibration_table\" [fillcolor=\"#94DDF4\"];\n \"haplotypes_bam\" [fillcolor=\"#94DDF4\"];\n \"fastqc_reports\" [fillcolor=\"#94DDF4\"];\n }\n \"qc\";\n \"file_pair_details\";\n \"trim\";\n \"recalibrate_02_apply_bqsr\";\n \"combine_reads\";\n \"variant_calling\";\n \"generate_sample_filenames\";\n \"sort\";\n \"mark_duplicates\";\n \"recalibrate_01_analyze\";\n \"map\";\n \"recalibrate_02_apply_bqsr\" -> \"recalibrated_reads\";\n \"variant_calling\" -> \"raw_variants\";\n \"trim\" -> \"trim_reports\";\n \"mark_duplicates\" -> \"markduplicates_bam\";\n \"recalibrate_01_analyze\" -> \"recalibration_table\";\n \"variant_calling\" -> \"haplotypes_bam\";\n \"qc\" -> \"fastqc_reports\";\n \"combine_reads\" -> \"qc\";\n \"default1\" [label=\"\\\"4\\\"\", fillcolor=\"#D5AEFC\"]\n \"default1\" -> \"qc\";\n \"read_pair\" -> \"file_pair_details\";\n \"library\" -> \"file_pair_details\";\n \"platform\" -> \"file_pair_details\";\n \"combine_reads\" -> \"trim\";\n \"default2\" [label=\"\\\"true\\\"\", fillcolor=\"#D5AEFC\"]\n \"default2\" -> \"trim\";\n \"reference_genome\" -> \"recalibrate_02_apply_bqsr\";\n \"intervals\" -> \"recalibrate_02_apply_bqsr\";\n \"interval_padding\" -> \"recalibrate_02_apply_bqsr\";\n \"recalibrate_01_analyze\" -> \"recalibrate_02_apply_bqsr\";\n \"generate_sample_filenames\" -> \"recalibrate_02_apply_bqsr\";\n \"default3\" [label=\"\\\"true\\\"\", fillcolor=\"#D5AEFC\"]\n \"default3\" -> \"recalibrate_02_apply_bqsr\";\n \"mark_duplicates\" -> \"recalibrate_02_apply_bqsr\";\n \"default4\" [label=\"\\\"-Xms3g\\\"\", fillcolor=\"#D5AEFC\"]\n \"default4\" -> \"recalibrate_02_apply_bqsr\";\n \"generate_sample_filenames\" -> \"combine_reads\";\n \"file_pair_details\" -> \"combine_reads\";\n \"reference_genome\" -> \"variant_calling\";\n \"generate_sample_filenames\" -> \"variant_calling\";\n \"generate_sample_filenames\" -> \"variant_calling\";\n \"intervals\" -> \"variant_calling\";\n \"interval_padding\" -> \"variant_calling\";\n \"default5\" [label=\"\\\"\\\"\", fillcolor=\"#D5AEFC\"]\n \"default5\" -> \"variant_calling\";\n \"recalibrate_02_apply_bqsr\" -> \"variant_calling\";\n \"default6\" [label=\"\\\"GVCF\\\"\", fillcolor=\"#D5AEFC\"]\n \"default6\" -> \"variant_calling\";\n \"default7\" [label=\"\\\"-Xms7g\\\"\", fillcolor=\"#D5AEFC\"]\n \"default7\" -> \"variant_calling\";\n \"file_pair_details\" -> \"generate_sample_filenames\";\n \"map\" -> \"sort\";\n \"generate_sample_filenames\" -> \"sort\";\n \"default8\" [label=\"\\\"coordinate\\\"\", fillcolor=\"#D5AEFC\"]\n \"default8\" -> \"sort\";\n \"default9\" [label=\"\\\"-Xms4g\\\"\", fillcolor=\"#D5AEFC\"]\n \"default9\" -> \"sort\";\n \"default10\" [label=\"\\\"coordinate\\\"\", fillcolor=\"#D5AEFC\"]\n \"default10\" -> \"mark_duplicates\";\n \"sort\" -> \"mark_duplicates\";\n \"default11\" [label=\"\\\"SILENT\\\"\", fillcolor=\"#D5AEFC\"]\n \"default11\" -> \"mark_duplicates\";\n \"generate_sample_filenames\" -> \"mark_duplicates\";\n \"default12\" [label=\"\\\"true\\\"\", fillcolor=\"#D5AEFC\"]\n \"default12\" -> \"mark_duplicates\";\n \"generate_sample_filenames\" -> \"mark_duplicates\";\n \"default13\" [label=\"\\\"2500\\\"\", fillcolor=\"#D5AEFC\"]\n \"default13\" -> \"mark_duplicates\";\n \"default14\" [label=\"\\\"-Xms4g\\\"\", fillcolor=\"#D5AEFC\"]\n \"default14\" -> \"mark_duplicates\";\n \"reference_genome\" -> \"recalibrate_01_analyze\";\n \"intervals\" -> \"recalibrate_01_analyze\";\n \"interval_padding\" -> \"recalibrate_01_analyze\";\n \"generate_sample_filenames\" -> \"recalibrate_01_analyze\";\n \"mark_duplicates\" -> \"recalibrate_01_analyze\";\n \"known_sites\" -> \"recalibrate_01_analyze\";\n \"default15\" [label=\"\\\"-Xms4g\\\"\", fillcolor=\"#D5AEFC\"]\n \"default15\" -> \"recalibrate_01_analyze\";\n \"reference_genome\" -> \"map\";\n \"trim\" -> \"map\";\n \"generate_sample_filenames\" -> \"map\";\n \"file_pair_details\" -> \"map\";\n \"threads\" -> \"map\";\n\n // Invisible links to force outputs to be at lowest rank\n \"qc\" -> \"recalibrated_reads\" [style=invis];\n \"file_pair_details\" -> \"recalibrated_reads\" [style=invis];\n \"trim\" -> \"recalibrated_reads\" [style=invis];\n \"recalibrate_02_apply_bqsr\" -> \"recalibrated_reads\" [style=invis];\n \"combine_reads\" -> \"recalibrated_reads\" [style=invis];\n \"variant_calling\" -> \"recalibrated_reads\" [style=invis];\n \"generate_sample_filenames\" -> \"recalibrated_reads\" [style=invis];\n \"sort\" -> \"recalibrated_reads\" [style=invis];\n \"mark_duplicates\" -> \"recalibrated_reads\" [style=invis];\n \"recalibrate_01_analyze\" -> \"recalibrated_reads\" [style=invis];\n \"map\" -> \"recalibrated_reads\" [style=invis];\n}\n", "permaLinkBase" : "https://w3id.org/cwl/view" }, "cwltoolStatus" : "RUNNING", "cwltoolVersion" : "" } { "_id" : ObjectId("5fd94b352ab79c0001183403"), "_class" : "org.commonwl.view.workflow.QueuedWorkflow", "tempRepresentation" : { "_id" : null, "retrievedFrom" : { "repoUrl" : "https://github.com/rosafilgueira/cyclon_usecase.git", "branch" : "d6d8c3b03c66e7594561c22c1de80eda2113b277", "path" : "run_cyclon/tracking_master.cwl" }, "retrievedOn" : ISODate("2020-12-15T23:48:05.967Z"), "lastCommit" : "d6d8c3b03c66e7594561c22c1de80eda2113b277", "label" : "tracking_master.cwl", "inputs" : { "script_transferfiles" : { "type" : "File", "sourceID" : [ ] }, "script_xml2ascii" : { "type" : "File", "sourceID" : [ ] }, "script_postprocess" : { "type" : "File", "sourceID" : [ ] }, "script_processfiles" : { "type" : "File", "sourceID" : [ ] }, "script_extractnc" : { "type" : "File", "sourceID" : [ ] }, "script_environment" : { "type" : "File", "sourceID" : [ ] }, "script_make_tracks" : { "type" : "File", "sourceID" : [ ] } }, "outputs" : { "results" : { "type" : "Directory", "sourceID" : [ "postprocess" ] } }, "steps" : { "make_tracks" : { "run" : "make_tracks.cwl", "sources" : { "cyclon" : { "sourceID" : [ "extractnc" ] }, "script" : { "sourceID" : [ "script_make_tracks" ] } } }, "xml2ascii" : { "run" : "xml2ascii.cwl", "sources" : { "cyclon" : { "sourceID" : [ "make_tracks" ] }, "script" : { "sourceID" : [ "script_xml2ascii" ] } } }, "processfiles" : { "run" : "processfiles.cwl", "sources" : { "cyclon" : { "sourceID" : [ "create_environment" ] }, "script" : { "sourceID" : [ "script_processfiles" ] } } }, "postprocess" : { "run" : "postprocess.cwl", "sources" : { "cyclon" : { "sourceID" : [ "xml2ascii" ] }, "script" : { "sourceID" : [ "script_postprocess" ] } } }, "extractnc" : { "run" : "extractnc.cwl", "sources" : { "cyclon" : { "sourceID" : [ "transferfiles" ] }, "script" : { "sourceID" : [ "script_extractnc" ] } } }, "create_environment" : { "run" : "env_preparation.cwl", "sources" : { "script" : { "sourceID" : [ "script_environment" ] } } }, "transferfiles" : { "run" : "transferfiles.cwl", "sources" : { "cyclon" : { "sourceID" : [ "processfiles" ] }, "script" : { "sourceID" : [ "script_transferfiles" ] } } } }, "cwltoolVersion" : "1.0.20180525185854", "visualisationDot" : "digraph workflow {\n graph [\n bgcolor = \"#eeeeee\"\n color = \"black\"\n fontsize = \"10\"\n labeljust = \"left\"\n clusterrank = \"local\"\n ranksep = \"0.22\"\n nodesep = \"0.05\"\n ]\n node [\n fontname = \"Helvetica\"\n fontsize = \"10\"\n fontcolor = \"black\"\n shape = \"record\"\n height = \"0\"\n width = \"0\"\n color = \"black\"\n fillcolor = \"lightgoldenrodyellow\"\n style = \"filled\"\n ];\n edge [\n fontname=\"Helvetica\"\n fontsize=\"8\"\n fontcolor=\"black\"\n color=\"black\"\n arrowsize=\"0.7\"\n ];\n subgraph cluster_inputs {\n rank = \"same\";\n style = \"dashed\";\n label = \"Workflow Inputs\";\n \"script_transferfiles\" [fillcolor=\"#94DDF4\"];\n \"script_xml2ascii\" [fillcolor=\"#94DDF4\"];\n \"script_postprocess\" [fillcolor=\"#94DDF4\"];\n \"script_processfiles\" [fillcolor=\"#94DDF4\"];\n \"script_extractnc\" [fillcolor=\"#94DDF4\"];\n \"script_environment\" [fillcolor=\"#94DDF4\"];\n \"script_make_tracks\" [fillcolor=\"#94DDF4\"];\n }\n subgraph cluster_outputs {\n rank = \"same\";\n style = \"dashed\";\n label = \"Workflow Outputs\";\n \"results\" [fillcolor=\"#94DDF4\"];\n }\n \"make_tracks\";\n \"xml2ascii\";\n \"processfiles\";\n \"postprocess\";\n \"extractnc\";\n \"create_environment\";\n \"transferfiles\";\n \"postprocess\" -> \"results\";\n \"extractnc\" -> \"make_tracks\";\n \"script_make_tracks\" -> \"make_tracks\";\n \"make_tracks\" -> \"xml2ascii\";\n \"script_xml2ascii\" -> \"xml2ascii\";\n \"create_environment\" -> \"processfiles\";\n \"script_processfiles\" -> \"processfiles\";\n \"xml2ascii\" -> \"postprocess\";\n \"script_postprocess\" -> \"postprocess\";\n \"transferfiles\" -> \"extractnc\";\n \"script_extractnc\" -> \"extractnc\";\n \"script_environment\" -> \"create_environment\";\n \"processfiles\" -> \"transferfiles\";\n \"script_transferfiles\" -> \"transferfiles\";\n\n // Invisible links to force outputs to be at lowest rank\n \"make_tracks\" -> \"results\" [style=invis];\n \"xml2ascii\" -> \"results\" [style=invis];\n \"processfiles\" -> \"results\" [style=invis];\n \"postprocess\" -> \"results\" [style=invis];\n \"extractnc\" -> \"results\" [style=invis];\n \"create_environment\" -> \"results\" [style=invis];\n \"transferfiles\" -> \"results\" [style=invis];\n}\n", "permaLinkBase" : "https://w3id.org/cwl/view" }, "cwltoolStatus" : "RUNNING", "cwltoolVersion" : "" } > db.queuedWorkflow.deleteMany( { cwltoolStatus: "RUNNING" } ) { "acknowledged" : true, "deletedCount" : 6 } > db.queuedWorkflow.find( { cwltoolStatus: "RUNNING" } ).count() 0
And then I restarted the import. Hopefully that will resolve the issue.
.... quite a while later, and after I resized the data partition to 1TB (temporarily) to avoid having to stop and restart docker too many times:
... Trimmed queue from 1 to 1 Still running https://cwlviewer.arvados.org/queue/5ff4be662ab79c000178c747 Trimmed queue from 1 to 1 Still running https://cwlviewer.arvados.org/queue/5ff4be662ab79c000178c747 Trimmed queue from 1 to 1 Still running https://cwlviewer.arvados.org/queue/5ff4be662ab79c000178c747 Trimmed queue from 1 to 1 Still running https://cwlviewer.arvados.org/queue/5ff4be662ab79c000178c747 Trimmed queue from 1 to 1 Still running https://cwlviewer.arvados.org/queue/5ff4be662ab79c000178c747 Trimmed queue from 1 to 1 Still running https://cwlviewer.arvados.org/queue/5ff4be662ab79c000178c747 Trimmed queue from 1 to 1 Trimmed queue from 1 to 0 real 191m42.084s user 13m34.416s sys 0m26.841s
We now have 1845 workflows available. Hmm, there are 22535 workflows at view.commonwl.org.
The dump I was importing says it had 21836 elements. I made a new one just now, and that one has 22535:
"totalElements": 21836,
"totalPages": 11
"totalElements": 22535,
"totalPages": 12
Hmm. That's quite a far cry from 1845. I've kicked off an import for the new export.
Meanwhile, looking at the import that got us to 1845, I'm seeing only 27 outright failures:
Failed https://cwlviewer.arvados.org/queue/5fd8e89308813b0001ca6b51: Tool definition failed validation: Failed https://cwlviewer.arvados.org/queue/5fd909ed2ab79c0001182cb3: Could not load extension schema https://schema.org/docs/schema_org_rdfa.html: None:11:92: Repeat node-elements inside property elements: http://www.w3.org/1999/xhtmlmeta Failed https://cwlviewer.arvados.org/queue/5fd968e808813b00010be641: Tool definition failed validation: Failed https://cwlviewer.arvados.org/queue/5fd9127b2ab79c0001182d8c: Tool definition failed validation: Failed https://cwlviewer.arvados.org/queue/5fd918e92ab79c0001182e36: Tool definition failed validation: Failed https://cwlviewer.arvados.org/queue/5fd92cfc2ab79c0001183050: Tool definition failed validation: Failed https://cwlviewer.arvados.org/queue/5fd92e672ab79c0001183085: Tool definition failed validation: Failed https://cwlviewer.arvados.org/queue/5fd9690508813b00010be642: Workflow checker warning: Failed https://cwlviewer.arvados.org/queue/5fd7bbda2ab79c0001a0342c: Tool definition failed validation: Failed https://cwlviewer.arvados.org/queue/5fd93a9c2ab79c00011831f0: Tool definition failed validation: Failed https://cwlviewer.arvados.org/queue/5fd93bf52ab79c0001183221: Tool definition failed validation: Failed https://cwlviewer.arvados.org/queue/5fd93c342ab79c000118322e: Tool definition failed validation: Failed https://cwlviewer.arvados.org/queue/5fd93c852ab79c000118323d: Tool definition failed validation: Failed https://cwlviewer.arvados.org/queue/5fd946542ab79c0001183390: Tool definition failed validation: Failed https://cwlviewer.arvados.org/queue/5fd9691708813b00010be643: I'm sorry, I couldn't load this CWL file, try again with --debug for more information. Failed https://cwlviewer.arvados.org/queue/5ff4968f2ab79c000178c249: Whoops! Cwltool ran successfully, but an unexpected error occurred in CWLViewer! Failed https://cwlviewer.arvados.org/queue/5ff497ad2ab79c000178c26a: Tool definition failed validation: Failed https://cwlviewer.arvados.org/queue/5ff49ca72ab79c000178c321: Tool definition failed validation: Failed https://cwlviewer.arvados.org/queue/5ff49ca42ab79c000178c31f: Could not load extension schema http://schema.org/version/latest/schema.rdf: ('http://schema.org/version/latest/schema.rdf', HTTPError('404 Client Error: Not Found for url: https://schema.org/version/latest/schema.rdf',)) Failed https://cwlviewer.arvados.org/queue/5ff4a25f2ab79c000178c3db: Tool definition failed validation: Failed https://cwlviewer.arvados.org/queue/5ff4a5182ab79c000178c437: Tool definition failed validation: Failed https://cwlviewer.arvados.org/queue/5ff4aa0a2ab79c000178c4ef: Tool definition failed validation: Failed https://cwlviewer.arvados.org/queue/5ff4b0f42ab79c000178c5be: Tool definition failed validation: Failed https://cwlviewer.arvados.org/queue/5ff4b3002ab79c000178c603: Tool definition failed validation: Failed https://cwlviewer.arvados.org/queue/5ff4bb8a2ab79c000178c6ec: Tool definition failed validation: Failed https://cwlviewer.arvados.org/queue/5ff4bcd82ab79c000178c715: Tool definition failed validation: Failed https://cwlviewer.arvados.org/queue/5ff4bdb42ab79c000178c73e: Tool definition failed validation:
There are also 116 'Unhandled HTTP status code' errors:
$ cat screen.log.first.complete.run |grep ^Unhandled|wc 116 1856 12992 $ cat screen.log.first.complete.run |grep ^Unhandled |head Unhandled HTTP status code: 400 {"message":"Error: Workflow could not be created from the provided cwl file"} Unhandled HTTP status code: 400 {"message":"Error: Workflow could not be created from the provided cwl file"} Unhandled HTTP status code: 400 {"message":"Error: Workflow could not be created from the provided cwl file"} Unhandled HTTP status code: 400 {"message":"Error: Workflow could not be created from the provided cwl file"} Unhandled HTTP status code: 400 {"message":"Error: Workflow could not be created from the provided cwl file"} Unhandled HTTP status code: 400 {"message":"Error: Workflow could not be created from the provided cwl file"} Unhandled HTTP status code: 400 {"message":"Error: Workflow could not be created from the provided cwl file"} Unhandled HTTP status code: 400 {"message":"Error: Workflow could not be created from the provided cwl file"} Unhandled HTTP status code: 400 {"message":"Error: Workflow could not be created from the provided cwl file"} Unhandled HTTP status code: 400 {"message":"Error: Workflow could not be created from the provided cwl file"}
There are 2000 (exactly! that is suspicious) workflows posted:
$ cat screen.log.first.complete.run |grep ^Posted |wc 2000 14180 289106 $ cat screen.log.first.complete.run |grep ^Posted |head Posted: {'url': 'https://github.com/dockstore-testing/md5sum-checker.git', 'branch': 'master', 'path': 'checker-workflow-wrapping-tool.cwl'} Posted: {'url': 'https://github.com/alexbarrera/GGR-cwl.git', 'branch': '6e008c1170ef818b6c4c63f0eec7baa4f7be7b3c', 'path': 'v1.0/STARR-seq_pipeline/04-quantification.cwl'} Posted: {'url': 'https://github.com/common-workflow-language/common-workflow-language.git', 'branch': 'a062055fddcc7d7d9dbc53d28288e3ccb9a800d8', 'path': 'v1.0/v1.0/dynresreq-workflow-stepdefault.cwl'} Posted: {'url': 'https://github.com/bcbio/bcbio_validation_workflows.git', 'branch': 'master', 'path': 'somatic-giab-mix/somatic-giab-mix-workflow/wf-variantcall.cwl'} Posted: {'url': 'https://github.com/Duke-GCB/bespin-cwl.git', 'branch': 'qiime2-workflow-paired', 'path': 'packed/qiime2-step2-deblur.cwl', 'packedId': 'main'} Posted: {'url': 'https://github.com/smc-rna-challenge/zhanghj-9609644.git', 'branch': 'master', 'path': 'main.cwl'} Posted: {'url': 'https://github.com/bespin-workflows/exomeseq-gatk4.git', 'branch': 'develop', 'path': 'subworkflows/exomeseq-gatk4-00-prepare-reference-data.cwl'} Posted: {'url': 'https://github.com/YeoLab/eclip.git', 'branch': '7196b92e262fe5f8acee04cb0d1b6fd23e4febdc', 'path': 'cwl/wf_demultiplex_pe.cwl'} Posted: {'url': 'https://github.com/YeoLab/eclip.git', 'branch': 'e2a314ff7646c4ea7b90f65caad0452ef6874757', 'path': 'cwl/wf_demultiplex_pe.cwl'} Posted: {'url': 'https://github.com/mr-c/cwltests.git', 'branch': 'pack_test', 'path': 'cwl/packed.cwl', 'packedId': 'workflow_data.cwl'}
2000-116-27 == 1857
That's close to 1845. Where are those last 12?
Perhaps more importantly, what is limiting the import to 2000 entries?
$ cat 2021-01-05T203420+0000.json |jq -r .content |jq length 2000
Looks like perhaps there were not even exported, sigh.
Okay, the dump.sh script specifies a `size=100000000` parameter, but it seems that the server ignores this and always limits the request to at most `size=2000`. Okay. So we've done only 1 of 12 pages so far. I modified the dump script to download all 12:
/var/backups/cwl/2021-01-05T215304+0000.json /var/backups/cwl/2021-01-05T215309+0000.json /var/backups/cwl/2021-01-05T215313+0000.json /var/backups/cwl/2021-01-05T215317+0000.json /var/backups/cwl/2021-01-05T215321+0000.json /var/backups/cwl/2021-01-05T215325+0000.json /var/backups/cwl/2021-01-05T215329+0000.json /var/backups/cwl/2021-01-05T215333+0000.json /var/backups/cwl/2021-01-05T215338+0000.json /var/backups/cwl/2021-01-05T215341+0000.json /var/backups/cwl/2021-01-05T215345+0000.json /var/backups/cwl/2021-01-05T215350+0000.json /var/backups/cwl/2021-01-05T215352+0000.json
And I'll start importing them all. That's going to take a while.
Updated by Ward Vandewege almost 4 years ago
As of 9:45am this morning, cwlviewer.arvados.org is at 4003 imported workflows. This is going to take a while longer...
Updated by Peter Amstutz almost 4 years ago
- Target version changed from 2021-01-06 Sprint to 2021-01-20 Sprint
Updated by Ward Vandewege almost 4 years ago
It filled up the 1TB partition somewhere partway through. Joy.
In addition, I'm getting
OperationFailed: Sort operation used more than the maximum 33554432 bytes of RAM. Add an index, or specify a smaller limit.
when trying to jump to the last entry (the screen shows: Whoops - Something Went Wrong!
Error: An internal server error occurred).
I fixed the latter with:
> use admin switched to db admin > db.runCommand( { getParameter : 1, "internalQueryExecMaxBlockingSortBytes" : 1 } ) { "internalQueryExecMaxBlockingSortBytes" : 33554432, "ok" : 1 } > db.adminCommand({setParameter: 1, internalQueryExecMaxBlockingSortBytes: 335544320}) { "was" : 33554432, "ok" : 1 } > db.runCommand( { getParameter : 1, "internalQueryExecMaxBlockingSortBytes" : 1 } ) { "internalQueryExecMaxBlockingSortBytes" : 335544320, "ok" : 1 }
Now I can see we have 4605 workflows imported, so I'll restart the import from page 3 (4000 and up).
Turns out that that setting doesn't survive a reboot of mongodb, so I'm setting it from the docker compose file now.
Updated by Ward Vandewege almost 4 years ago
I resized the partition to 10 TiB (!). And I restarted the import, from the dump file that has
"totalElements": 22535, "totalPages": 12
The import completed and has 21068 workflows. The rest failed.
I finally found https://github.com/common-workflow-language/cwlviewer/issues/279 which explains the issue with the disk space leakage, and suggests 2 commands to keep it in check. I'll just run those in a cron, every half hour or so:
docker-compose exec -T spring find /tmp -type f -mtime +1 -delete docker-compose exec -T spring find /tmp -type d -mtime +30 -delete
OK, I've installed that cron job and removed the 10 TB partition. I ended up using a 150 GB data partition, which is about 33% full at the moment.
I think we're ready to cut over the DNS, then get the ssl cert, and this is done.
Updated by Ward Vandewege almost 4 years ago
Old DNS:
view.commonwl.org. 10757 IN CNAME heater.cs.man.ac.uk.
New DNS:
view.commonwl.org CNAME cwlviewer.arvados.org
Updated by Ward Vandewege almost 4 years ago
- Status changed from In Progress to Resolved
The DNS change has been made.
Updated by Ward Vandewege over 3 years ago
- Related to Feature #17505: cwlviewer update added
Updated by Bruno Kinoshita almost 3 years ago
In case anyone ever needs, we can change the 2000 pagination size if really needed: https://stackoverflow.com/a/44705987