Arvados: Issueshttps://dev.arvados.org/https://dev.arvados.org/favicon.ico?15576888422023-01-25T15:50:39ZArvados
Redmine Arvados - Bug #19981 (New): Containers that used an old DefaultKeepCacheRAM no longer get reused ...https://dev.arvados.org/issues/199812023-01-25T15:50:39ZJiayong Lijli@curii.com
<p><strong>The Bug</strong></p>
<p>beagle.cwl has the resource requirement<br /><pre>
ResourceRequirement:
coresMin: 2
ramMin: 10000
</pre></p>
<p>A new run:<br /><a class="external" href="https://workbench.2xpu4.arvadosapi.com/container_requests/2xpu4-xvhdp-ph1xry8mxbsol3j">https://workbench.2xpu4.arvadosapi.com/container_requests/2xpu4-xvhdp-ph1xry8mxbsol3j</a></p>
<p>An old run:<br /><a class="external" href="https://workbench.2xpu4.arvadosapi.com/container_requests/2xpu4-xvhdp-p571e0xq4g85ac7">https://workbench.2xpu4.arvadosapi.com/container_requests/2xpu4-xvhdp-p571e0xq4g85ac7</a></p>
<p>The resource requirement didn't change, neither was keep_cache requirement specified. The recent run didn't reuse old run, since there is the following difference.</p>
<p>new runtime_constraints:<br /><pre>
keep_cache_disk 10485760000
keep_cache_ram 0
ram 10485760000
vcpus 2
</pre></p>
<p>new node type:<br /><pre>
"ProviderType": "m5.8xlarge",
"VCPUs": 32,
"RAM": 137438953472,
"IncludedScratch": 4000000000,
"AddedScratch": 100000000000,
"Price": 1.542,
</pre></p>
<p>old runtime_constraints:<br /><pre>
keep_cache_disk 0
keep_cache_ram 268435456
ram 10485760000
vcpus 2
</pre></p>
<p>old node type:<br /><pre>
"ProviderType": "m5.xlarge",
"VCPUs": 4,
"RAM": 17179869184,
"IncludedScratch": 4000000000,
"AddedScratch": 0,
"Price": 0.192,
</pre></p>
<p><strong>The Fix</strong></p>
<p>This happened because we changed the <code>DefaultKeepCacheRAM</code> setting on the cluster, to start using disk cache instead of memory. As a consequence, <code>Container.find_reusable</code> can no longer find containers that used the old default, because it searches for matching <code>runtime_constraints</code> with a hash match, and it doesn't know what the old value of <code>DefaultKeepCacheRAM</code> was to search for.</p>
<p>Ideally we would like to exclude the Keep cache constraints from reuse entirely but in order to do that we need some change to the way we store <code>runtime_constraints</code> in the database—right now it's just plain text. Ideas that have been suggested:</p>
<ul>
<li>Convert the column to <code>jsonb</code> and do richer queries on it (Brett in note-14)</li>
<li>Add a column <code>reusable_runtime_constraints</code> that's limited to recording the constraints that affect reuse (Tom in note-15)</li>
</ul>
<p>Agree on one and implement it.</p> Arvados - Bug #18102 (Resolved): max dispatch attempts errorhttps://dev.arvados.org/issues/181022021-09-03T20:06:21ZJiayong Lijli@curii.com
<p><a class="external" href="https://workbench.2xpu4.arvadosapi.com/container_requests/2xpu4-xvhdp-vzc0tvmwarbfnx1">https://workbench.2xpu4.arvadosapi.com/container_requests/2xpu4-xvhdp-vzc0tvmwarbfnx1</a></p>
<pre>
Error: Failed to start container. Cancelled after exceeding 'Containers.MaxDispatchAttempts' (lock_count=5)
</pre> Arvados - Bug #16169 (Resolved): tiling workflow cancelled for unknown reasonhttps://dev.arvados.org/issues/161692020-02-20T20:41:11ZJiayong Lijli@curii.com
<p>Running tiling workflow but it gets cancelled. <a class="external" href="https://workbench.su92l.arvadosapi.com/container_requests/su92l-xvhdp-mzrysxcgtubgva9">https://workbench.su92l.arvadosapi.com/container_requests/su92l-xvhdp-mzrysxcgtubgva9</a></p>
<p>I tried various run time constraints and workflow parameters, but they all get cancelled.</p>
<p>Before su92l was upgraded, I ran a workflow of the same scale (input also around 2TB), and it was successful. <a class="external" href="https://workbench.su92l.arvadosapi.com/container_requests/su92l-xvhdp-nm507pzmjqiai4s">https://workbench.su92l.arvadosapi.com/container_requests/su92l-xvhdp-nm507pzmjqiai4s</a></p>
<p>Contrasting individual jobs from these two runs, <a class="external" href="https://workbench.su92l.arvadosapi.com/container_requests/su92l-xvhdp-vdlq5f0hqldttso">https://workbench.su92l.arvadosapi.com/container_requests/su92l-xvhdp-vdlq5f0hqldttso</a> completed but <a class="external" href="https://workbench.su92l.arvadosapi.com/container_requests/su92l-xvhdp-t3dtsqsi3vqfetb">https://workbench.su92l.arvadosapi.com/container_requests/su92l-xvhdp-t3dtsqsi3vqfetb</a> is cancelled.</p> Arvados - Bug #16140 (Resolved): Running arvados-cwl-runner shows arvados/jobs:2.0.0 is not avail...https://dev.arvados.org/issues/161402020-02-07T18:20:57ZJiayong Lijli@curii.com
<p>I'm running a workflow on lightning-dev1, and got the following error.<br /><pre>
Error response from daemon: manifest for arvados/jobs:2.0.0 not found
ERROR Unhandled error, try again with --debug for more information:
Docker image arvados/jobs:2.0.0 is not available
Command '['docker', 'pull', 'arvados/jobs:2.0.0']' returned non-zero exit status 1.
</pre></p> Arvados - Bug #15579 (New): Staging a large number of files with "loadListing: no_listing" still ...https://dev.arvados.org/issues/155792019-08-21T16:03:13ZJiayong Lijli@curii.com
<p>I have a workflow (<a class="external" href="https://workbench.su92l.arvadosapi.com/container_requests/su92l-xvhdp-isnujmpvrksam2q">https://workbench.su92l.arvadosapi.com/container_requests/su92l-xvhdp-isnujmpvrksam2q</a>) that scatters the step "gvcf2fastj", each of them creates a directory structure as follows<br /><pre>
sample_name
|
---- 0000.fj.gz
|
---- 0001.fj.gz
|
----
</pre></p>
<p>The gathering step stages all of those directories in a single directory. There are 228 samples, and each directory has ~1700 files. The gathering step takes more than 30 mins to finish, even if I used "loadListing: no_listing" as Peter suggested.</p>
<p>Supporting information: the gathering javascript array-to-dir.cwl<br /><pre>
$namespaces:
arv: "http://arvados.org/cwl#"
cwltool: "http://commonwl.org/cwltool#"
class: ExpressionTool
cwlVersion: v1.0
hints:
cwltool:LoadListingRequirement:
loadListing: no_listing
inputs:
arr:
type:
type: array
items: [File, Directory]
dirname:
type: string
outputs:
dir: Directory
requirements:
InlineJavascriptRequirement: {}
expression: |
${
var dir = {"class": "Directory",
"basename": inputs.dirname,
"listing": inputs.arr};
return {"dir": dir};
}
</pre></p>
<p>Log for failed run due to javascript timeout (my eval-timeout is turned to 2000).<br /><pre>
2019-08-21T08:01:58.136588884Z cwltool WARNING: Failed to evaluate expression:
2019-08-21T08:01:58.136588884Z Expression evaluation error:
2019-08-21T08:01:58.136588884Z Long-running script killed after 2000.0 seconds: Javascript expression was: {
2019-08-21T08:01:58.136588884Z var dir = {"class": "Directory",
2019-08-21T08:01:58.136588884Z "basename": inputs.dirname,
2019-08-21T08:01:58.136588884Z "listing": inputs.arr};
2019-08-21T08:01:58.136588884Z return {"dir": dir};
2019-08-21T08:01:58.136588884Z }
2019-08-21T08:01:58.136588884Z stdout was: {"dir":{"class":"Directory","basename":"fjdir","listing":[{"basename":"A-UPN-UP000009-BL-UPN-3714","nameext":"","nameroot":"A-UPN-UP000009-BL-UPN-3714","location":"keep:38b03b88c07b0fee01871a0e4f748829+51042/stage/A-UPN-UP000009-BL-UPN-3714","listing":[{"basename":"00ce.fj.gz","nameroot":"00ce.fj","nameext":".gz","location":"keep:38b03b88c07b0fee01871a0e4f748829+51042/stage/A-UPN-UP000009-BL-UPN-3714/00ce.fj.gz","class":"File","size":1237495},{"basename":"001e.fj.gz","nameroot":"001e.fj","nameext":".gz","location":"keep:38b03b88c07b0fee01871a0e4f748829+51042/stage/A-UPN-UP000009-BL-UPN-3714/001e.fj.gz","class":"File","size":1244790},{"basename":"00c9.fj.gz","nameroot":"00c9.fj","nameext":".gz","location":"keep:38b03b88c07b0fee01871a0e4f748829+51042/stage/A-UPN-UP000009-BL-UPN-3714/00c9.fj.gz","class":"File","size":5199364},
...
{"basename":"01f8.fj.gz.gzi","nameroot":"01f8.fj.gz","nameext":".gzi","location":"keep:dbca435901969db7a1b2f38fb60ec240+51091/stage/A-CUHS-CU000305-BL-COL-60321BL1/01f8.fj.gz.gzi","class":"File","size":7144},{"basename":"0089.fj.gz.gzi","nameroot":"0089.fj.gz","nameext":".gzi","location":"keep:dbca435901969db7a1b2f38fb60ec240+51091/stage/A-CUHS-CU000305-BL-COL-60321BL1/0089.fj.gz.gzi","class":"File","size":5576},{"basename":"01f7.fj.gz","nameroot":"01f7.fj","nameext":".gz","location":"keep:dbca435901969db7a1b2f38fb60ec240+51091/stage/A-CUHS-CU000305-BL-COL-
2019-08-21T08:01:58.375853555Z stderr was:
2019-08-21T08:02:15.664004152Z cwltool ERROR: [step handle-fjdirs] Output is missing expected field file:///var/lib/cwl/workflow.json#main/handle-fjdirs/dir
2019-08-21T08:02:58.086903477Z cwltool WARNING: [step handle-fjdirs] completed permanentFail
</pre></p> Arvados - Idea #14888 (Resolved): [CWL] running expression tool on arvados doesn't return proper ...https://dev.arvados.org/issues/148882019-02-26T16:54:51ZJiayong Lijli@curii.com
<p>I'm running a workflow (catbeds.cwl) that uses expression tool to get all the bed files from a directory, and then catting them. The workflow works fine.<br /><a class="external" href="https://workbench.su92l.arvadosapi.com/container_requests/su92l-xvhdp-if13mus3rgqopbs">https://workbench.su92l.arvadosapi.com/container_requests/su92l-xvhdp-if13mus3rgqopbs</a></p>
<p>But if I run the expression tool only (getbeds.cwl) <a class="external" href="https://workbench.su92l.arvadosapi.com/container_requests/su92l-xvhdp-95zyqpj0b4k1nuz">https://workbench.su92l.arvadosapi.com/container_requests/su92l-xvhdp-95zyqpj0b4k1nuz</a>, it shows the beds as an empty array.<br /><pre>
{
"beds": []
}
</pre></p>
<p>This is very confusing when we're testing the expression tool alone.</p> Arvados - Bug #14429 (New): [CWL] Initial work dir errorhttps://dev.arvados.org/issues/144292018-10-30T19:48:48ZJiayong Lijli@curii.com
<p>I used the following simple cwl to do gzip using initial work dir.</p>
<p><em>gzip.cwl</em><br /><pre>
cwlVersion: v1.0
class: CommandLineTool
requirements:
DockerRequirement:
dockerPull: arvados/jobs
InitialWorkDirRequirement:
listing:
- $(inputs.infile)
inputs:
infile: File
outputs:
outfile:
type: File
outputBinding:
glob: "*.gz"
baseCommand: gzip
arguments:
- $(inputs.infile.basename)
</pre></p>
<p>Run: <a class="external" href="https://workbench.e51c5.arvadosapi.com/container_requests/e51c5-xvhdp-uzdzlsqwwodcz0e">https://workbench.e51c5.arvadosapi.com/container_requests/e51c5-xvhdp-uzdzlsqwwodcz0e</a></p>
<p>Error message:<br /><pre>
gzip: foo: Device or resource busy
</pre></p> Arvados - Bug #13931 (Resolved): [CWL] size field not accessible on arvadoshttps://dev.arvados.org/issues/139312018-07-30T15:34:21ZJiayong Lijli@curii.com
<p>Getting the cwl error.<br /><pre>
2018-07-30T14:55:20.765294319Z cwltool ERROR: Execution failed: ../../lib/cwl/workflow.json:1:26: Expression evaluation error:
2018-07-30T14:55:20.765294319Z ../../lib/cwl/workflow.json:1:26: Syntax error in parameter reference 'inputs.fastq1.size': inputs.fastq1.size does not contain key 'size'. This could be due to using Javascript code without specifying InlineJavascriptRequirement.
</pre></p>
<p>cwl script as follows<br /><pre>
cwlVersion: v1.0
class: CommandLineTool
inputs:
fastq1: File
outputs:
out: stdout
baseCommand: echo
arguments:
- $(inputs.fastq1.size)
stdout: size.txt
</pre></p>
<p>Version<br /><pre>
/data-sdd/home/jiayong/envs/acr/bin/arvados-cwl-runner 1.1.4.20180720151136,
arvados-python-client 1.1.4.20180720151136, cwltool 1.0.20180615183820
</pre></p> Arvados - Bug #13849 (Resolved): [CWL] secondaryFiles checks failed in single containerhttps://dev.arvados.org/issues/138492018-07-18T17:24:29ZJiayong Lijli@curii.com
<p>single container<br /><a class="external" href="https://workbench.su92l.arvadosapi.com/container_requests/su92l-xvhdp-1vziek0s8udmajx">https://workbench.su92l.arvadosapi.com/container_requests/su92l-xvhdp-1vziek0s8udmajx</a></p>
<p>non-single container<br /><a class="external" href="https://workbench.su92l.arvadosapi.com/container_requests/su92l-xvhdp-g0u996tfq07d4mp">https://workbench.su92l.arvadosapi.com/container_requests/su92l-xvhdp-g0u996tfq07d4mp</a></p>
<p>Note that the non-single container run failed, with the secondaryFiles checks prompted errors, as they should.<br /><pre>
cwltool ERROR: Cannot make scatter job: Missing required secondary file 'chr1.1kg.phase3.v5a.vcf.gz.tbi' from file object: {
"basename": "chr1.1kg.phase3.v5a.vcf.gz",
"nameroot": "chr1.1kg.phase3.v5a.vcf",
"nameext": ".gz",
"location": "keep:ba7cd392bc4aa229c3c771b496e79628+9990/chr1.1kg.phase3.v5a.vcf.gz",
"secondaryFiles": [],
"class": "File"
}
</pre></p>
<p>This means the secondaryFiles checks failed in the single container version. See the attached tar ball for cwl. Run with<br /><pre>
arvados-cwl-runner phasing-wf.cwl phasing-NA12878.yml
</pre></p> Lightning - Idea #13376 (Resolved): Test the effect of phasing imputation workflowhttps://dev.arvados.org/issues/133762018-04-20T15:57:29ZJiayong Lijli@curii.com
<p>Examine the output of the following runs:</p>
<p>GS12877:<br /><a class="external" href="https://workbench.su92l.arvadosapi.com/container_requests/su92l-xvhdp-7n7aty5cnojbaap">https://workbench.su92l.arvadosapi.com/container_requests/su92l-xvhdp-7n7aty5cnojbaap</a></p>
<p>NA12878:<br /><a class="external" href="https://workbench.su92l.arvadosapi.com/container_requests/su92l-xvhdp-gexbre3xknqqxyj">https://workbench.su92l.arvadosapi.com/container_requests/su92l-xvhdp-gexbre3xknqqxyj</a></p> Lightning - Task #13334 (Resolved): Reviewhttps://dev.arvados.org/issues/133342018-04-05T18:54:35ZJiayong Lijli@curii.com
<p>Reviewing branch 13216-phasing-imputation-workflow on l7g-ml<br />with workflows in project <a href="https://arvadosapi.com/su92l-j7d0g-049ipwfxdg21tun">su92l-j7d0g-049ipwfxdg21tun</a><br />Specifically the following runs:</p>
<ul>
<li><a href="https://arvadosapi.com/su92l-xvhdp-zb7seb8e430gfmy">su92l-xvhdp-zb7seb8e430gfmy</a> beagle.cwl</li>
<li><a href="https://arvadosapi.com/su92l-xvhdp-5kqltvk43tcb984">su92l-xvhdp-5kqltvk43tcb984</a> imputation-wf.cwl</li>
<li><a href="https://arvadosapi.com/su92l-xvhdp-3v5lgvkm8odwbtb">su92l-xvhdp-3v5lgvkm8odwbtb</a> imputation-wf.cwl</li>
</ul> Lightning - Task #13300 (Resolved): Write phasing workflow with beagle 4.1https://dev.arvados.org/issues/133002018-03-30T15:51:09ZJiayong Lijli@curii.com
<ul>
<li>Build docker image with beagle 4.1 (<a class="external" href="http://faculty.washington.edu/browning/beagle/beagle.27Jan18.7e1.jar">http://faculty.washington.edu/browning/beagle/beagle.27Jan18.7e1.jar</a>)</li>
<li>Write workflow scattering across reference panels</li>
<li>Test workflow</li>
</ul> Lightning - Task #13263 (Resolved): Reviewhttps://dev.arvados.org/issues/132632018-03-22T16:05:49ZJiayong Lijli@curii.comArvados - Bug #13252 (New): [CWL] RunInSingleContainer requirement errors on DockerRequirement ev...https://dev.arvados.org/issues/132522018-03-21T16:06:06ZJiayong Lijli@curii.com
<p>I'm trying to run a workflow in a single container but it failed. <a class="external" href="https://workbench.su92l.arvadosapi.com/container_requests/su92l-xvhdp-8d7o1hq6mnnri85">https://workbench.su92l.arvadosapi.com/container_requests/su92l-xvhdp-8d7o1hq6mnnri85</a></p>
<p>I've got the following error.<br /><pre>
2018-03-21T00:06:23.256283830Z [step eagle] start
2018-03-21T00:06:23.260142930Z Exception on step 'eagle'
2018-03-21T00:06:23.260701530Z Cannot make scatter job: --no-container, but this CommandLineTool has DockerRequirement under 'requirements'.
2018-03-21T00:06:23.261063530Z Workflow cannot make any more progress.
2018-03-21T00:06:23.261720930Z Final process status is permanentFail
</pre><br />However, the script eagle.cwl does NOT have DockerRequirement.</p>
<p>See the attached tar ball for the cwl scripts. The command I was using:<br /><pre>
arvados-cwl-runner --api=containers --submit --no-wait --project-uuid <a href="https://arvadosapi.com/su92l-j7d0g-huzwo8ptw745hjx">su92l-j7d0g-huzwo8ptw745hjx</a> phasing-wf.cwl yml/phasing-GS12877.yml
</pre></p>
<p>Note that without the RunInSingleContainer requirement, the workflow completed. <a class="external" href="https://workbench.su92l.arvadosapi.com/container_requests/su92l-xvhdp-94180nzfj7yz6q5">https://workbench.su92l.arvadosapi.com/container_requests/su92l-xvhdp-94180nzfj7yz6q5</a></p>
<p>For the record<br /><pre>
arvados-cwl-runner --version
/usr/bin/arvados-cwl-runner 1.0.20180223182850, arvados-python-client 0.1.20180223161544, cwltool 1.0.20180130110340
</pre></p> Lightning - Idea #13216 (Resolved): Write phasing imputation workflowhttps://dev.arvados.org/issues/132162018-03-13T14:58:57ZJiayong Lijli@curii.com