Activity

From 09/21/2017 to 10/20/2017

Today

05:42 pm Task #12438 (In Progress): Review 11220-manifest-fetch-error
Tom Clegg
05:42 pm Task #12440 (In Progress): Review 11453-federated-tokens
Tom Clegg
05:41 pm Task #12455 (Resolved): Validate v2-format salted tokens
Tom Clegg
05:41 pm Task #12443 (Resolved): Review 12216-webdav-list
Tom Clegg
03:35 pm Feature #12260 (Feedback): Healthcheck endpoint aggregator
Tom Clegg
03:35 pm Task #12362 (Resolved): Review 12260-system-health
Tom Clegg
02:32 am Bug #11220: [SDKs] Fix misleading arv-mount/pysdk error messages by removing obsolete "fetch mani...
11220-manifest-fetch-error @ commit:93e437b0dfd453f00df59c6a84bcc5d3ef09a9be
I removed one test that said "arv-get...
Tom Clegg

10/19/2017

09:35 pm Feature #10666: All Arvados components should report their version
+1
is hard to tell from *the logs* if deploys are using the correct version
Nico César
09:32 pm Feature #10666: All Arvados components should report their version
This isn't specific to Go programs. I thought we'd already decided that all programs should return their version numb... Tom Morris
08:29 pm Feature #12216: [keep-web] machine-readable file listings
Updates at commit:ec0c244be178aed7af0cf990a256dda557034b68 LGTM.
Local keep-web tests didn't complain so I suppose w...
Lucas Di Pentima
08:24 pm Feature #12018: Synchronize group membership with external data source
Functional
In the "evict" code, it seems like we should be removing user→group _and_ group→user links, a...
Tom Clegg
02:59 pm Feature #12018: Synchronize group membership with external data source
Found a bug, working on it. Lucas Di Pentima
01:10 pm Feature #12018: Synchronize group membership with external data source
Updates at commit:ea10340803abade2d35212866fcbc1beb1acd533
* Added @-parent-group-uuid@ parameter to specify a par...
Lucas Di Pentima
07:53 pm Bug #11220 (In Progress): [SDKs] Fix misleading arv-mount/pysdk error messages by removing obsole...
Tom Clegg
06:54 pm Task #12458 (Resolved): Review 12446-dispatcher-query
Peter Amstutz
06:53 pm Task #12468 (Resolved): Review 12467-read-imgload-response
Peter Amstutz
05:48 pm Task #12468 (In Progress): Review 12467-read-imgload-response
Peter Amstutz
05:42 pm Task #12468 (Resolved): Review 12467-read-imgload-response
Peter Amstutz
06:15 pm Bug #12467 (Resolved): crunch-run not waiting for Docker image to finish loading.
Applied in changeset arvados|commit:6fd6ddcebda57df4ecb2303dc229420c2c13af7b. Anonymous
06:08 pm Bug #12467: crunch-run not waiting for Docker image to finish loading.
LGTM Tom Clegg
06:04 pm Bug #12467: crunch-run not waiting for Docker image to finish loading.
Tom Clegg wrote:
> Better form to do "defer response.Body.Close()" before ioutil.ReadAll(), so it gets closed even i...
Peter Amstutz
05:53 pm Bug #12467: crunch-run not waiting for Docker image to finish loading.
Better form to do "defer response.Body.Close()" before ioutil.ReadAll(), so it gets closed even if read fails.
I t...
Tom Clegg
05:41 pm Bug #12467 (In Progress): crunch-run not waiting for Docker image to finish loading.
12467-read-imgload-response @ commit:d64215e8f0057cc7b4c6295932bc7f44d27a1eb5
Peter Amstutz
05:28 pm Bug #12467 (Resolved): crunch-run not waiting for Docker image to finish loading.
It is failing to load Docker image, even though the image load seemingly succeeds:... Peter Amstutz
06:06 pm Bug #12183: [crunch-run] Handle symlinks with absolute paths into output directory
NotInOutputDirError should be ErrNotInOutputDir (convention is FooError is a type, ErrFoo is an object)
Might as w...
Tom Clegg
03:45 pm Bug #12446 (Resolved): [crunch2] crunch-dispatch-slurm monitoring too many containers gets 414 error
Applied in changeset arvados|commit:b51d376ed64efc68f7ee27fd061323da43faabd5. Anonymous
03:26 pm Bug #12446: [crunch2] crunch-dispatch-slurm monitoring too many containers gets 414 error
LGTM, thanks Tom Clegg
03:19 pm Bug #12446: [crunch2] crunch-dispatch-slurm monitoring too many containers gets 414 error
Tom Clegg wrote:
> Batching bugfix LGTM.
>
> The "seen" stuff looks like it's fixing a different condition, where...
Peter Amstutz
01:50 pm Bug #12465 (New): [crunchv2] Improve crunch-run environment reporting
Crunch-run should log its environment earlier in the startup process (before loading the Docker image or creating arv... Peter Amstutz
01:11 pm Task #12264 (In Progress): Review 12018-sync-groups-tool
Lucas Di Pentima

10/18/2017

08:18 pm Bug #12446: [crunch2] crunch-dispatch-slurm monitoring too many containers gets 414 error
Batching bugfix LGTM.
The "seen" stuff looks like it's fixing a different condition, where a dispatcher is trackin...
Tom Clegg
05:29 pm Feature #12018: Synchronize group membership with external data source
Updates at commit:ed6af9cb4
This commit is about tidying up the code, following the suggestions on note-11:
* R...
Lucas Di Pentima

10/17/2017

06:19 pm Bug #12460 (New): su92l only can run 11 jobs at once
su92l seems to be in the state where it is showing that it is only running 10-15 jobs at a time. Not sure if that is... Sarah Zaranek
03:11 pm Feature #12216: [keep-web] machine-readable file listings
12216-webdav-list @ commit:ec0c244be178aed7af0cf990a256dda557034b68
* merged master
* separate TTL for uuid->pdh ca...
Tom Clegg
01:56 pm Feature #12216: [keep-web] machine-readable file listings
Latest updates lgtm, lazy file opening is a cool idea!
Regarding cache invalidation, I was seeing something like you...
Lucas Di Pentima
02:54 pm Task #12449 (Resolved): Review 12447-crunch-run-leak
Peter Amstutz
02:53 pm Bug #12446: [crunch2] crunch-dispatch-slurm monitoring too many containers gets 414 error
For review:
12446-dispatcher-query @ commit:32713b4b3c2e1685b79acb24059a5b817cf6cbfc
Peter Amstutz
02:04 pm Bug #12446 (In Progress): [crunch2] crunch-dispatch-slurm monitoring too many containers gets 414...
Peter Amstutz
02:51 pm Task #12458 (In Progress): Review 12446-dispatcher-query
Peter Amstutz
02:04 pm Task #12458 (Resolved): Review 12446-dispatcher-query
Peter Amstutz

10/16/2017

08:42 pm Bug #12183: [crunch-run] Handle symlinks with absolute paths into output directory
Tom Clegg wrote:
> Better, thanks.
>
> I think we need to look out for infinitely deep directory hierarchies, now...
Peter Amstutz
08:16 pm Bug #12183: [crunch-run] Handle symlinks with absolute paths into output directory
Better, thanks.
I think we need to look out for infinitely deep directory hierarchies, now that we're following sy...
Tom Clegg
07:29 pm Bug #12183: [crunch-run] Handle symlinks with absolute paths into output directory
Tom Clegg wrote:
> Error with comment "tgt doesn't exist or lacks permissions" seems to be reported to user as "poin...
Peter Amstutz
07:24 pm Task #12455 (In Progress): Validate v2-format salted tokens
Tom Clegg
07:11 pm Task #12455 (Resolved): Validate v2-format salted tokens
Tom Clegg
07:23 pm Story #11453 (In Progress): Federated user identity which works across a network of Arvados clusters
Tom Clegg
06:51 pm Feature #12260: Healthcheck endpoint aggregator
LGTM. Thanks. Lucas Di Pentima
05:52 pm Feature #12260: Healthcheck endpoint aggregator
Lucas Di Pentima wrote:
> Some comments/questions:
>
> * File @sdk/go/arvados/config.go@
> ** Lines 53 & 63: Com...
Tom Clegg
06:19 pm Feature #12216: [keep-web] machine-readable file listings
Some follow-up fixes: 12216-webdav-list @ commit:337de2e3dfeacc5054cb644513be61f5d35585ae
* allow Authorization head...
Tom Clegg
02:16 pm Feature #12216: [keep-web] machine-readable file listings
Lucas Di Pentima wrote:
> I've encountered the cached listing behavior I mentioned on the chat, where a listing gets...
Tom Clegg
06:05 pm Bug #12447 (Resolved): crunch-run memory leak
Applied in changeset arvados|commit:15b5b59f5902fdc0fe4eb5366ba3b654b117d7df. Peter Amstutz
06:00 pm Bug #12447: crunch-run memory leak
LGTM @ commit:d0414ca72, thanks Tom Clegg
05:51 pm Bug #12447: crunch-run memory leak
Passing tests now https://ci.curoverse.com/job/developer-run-tests-remainder/488/
Peter Amstutz
05:13 pm Bug #12447: crunch-run memory leak
Tom Clegg wrote:
> Good to know that ioutil.ReadAll() can result in so much fragmentation...
>
> In block_cache.g...
Peter Amstutz

10/13/2017

06:10 pm Feature #12216: [keep-web] machine-readable file listings
As far as I can see, this looks good.
I've encountered the cached listing behavior I mentioned on the chat, where...
Lucas Di Pentima
04:35 pm Bug #12186: [cwl] cwl.input.yml contains "nameroot" and "nameext" fields, breaks reuse with RunIn...
From a customer point of view, it's important for them to be able to know which jobs are potentially affected by this... Tom Morris
02:27 pm Bug #12447: crunch-run memory leak
In block_cache.go, instead of using a bytes.Buffer, a more direct approach would be to call io.ReadFull using the siz... Tom Clegg
01:59 pm Bug #12447: crunch-run memory leak
Good to know that ioutil.ReadAll() can result in so much fragmentation...
In block_cache.go, new Clear() is not sa...
Tom Clegg
01:57 am Bug #12447 (In Progress): crunch-run memory leak
Peter Amstutz
01:56 am Bug #12447: crunch-run memory leak
So my working theory is that ReadAll() involves a number of intermediate allocations which are causing memory fragmen... Peter Amstutz
01:57 am Task #12449 (In Progress): Review 12447-crunch-run-leak
Peter Amstutz
01:57 am Task #12449 (Resolved): Review 12447-crunch-run-leak
Peter Amstutz

10/12/2017

09:13 pm Feature #12018: Synchronize group membership with external data source
You're right about all the type shenanigans. That map[string]arvadosclient.Dict stuff in the arvadosclient module was... Tom Clegg
05:17 pm Feature #12018: Synchronize group membership with external data source
First golang version at commit:46141b6c9098f30dcd6644845887789c1c9006da
It's basically a direct translation of the...
Lucas Di Pentima
09:03 pm Bug #12447: crunch-run memory leak
With GOGC=10 it peaks a bit less:... Peter Amstutz
09:01 pm Bug #12447: crunch-run memory leak
The memory profiling reports 416MB consistently used for loading the Docker image, this seems to be independent of im... Peter Amstutz
03:56 pm Bug #12447 (Resolved): crunch-run memory leak
Crunch-run loading a 2 GiB Docker image uses 1.5 GiB of RAM, which is enough on a 3.5 GiB node to prevent fork/exec d... Peter Amstutz
07:26 pm Task #12443 (In Progress): Review 12216-webdav-list
Tom Clegg
07:17 pm Feature #12216: [keep-web] machine-readable file listings
12216-webdav-list @ commit:a23fa06e9849f2ab76fa271624e22a245c2abc47
* test case using cadaver client (run-tests.sh n...
Tom Clegg
03:33 pm Feature #12216: [keep-web] machine-readable file listings
Does this include browsing projects? (Probably not, but for the desktop filesystem mount use case, it probably shoul... Peter Amstutz
05:49 pm Bug #12446: [crunch2] crunch-dispatch-slurm monitoring too many containers gets 414 error
If we do something like note-6 then we only expect each container to appear in that query string one time, right? So ... Tom Clegg
05:39 pm Bug #12446: [crunch2] crunch-dispatch-slurm monitoring too many containers gets 414 error
We start tracking containers when:
* They are Queued and we are able to take the lock
* The are Locked or Running...
Peter Amstutz
04:15 pm Bug #12446: [crunch2] crunch-dispatch-slurm monitoring too many containers gets 414 error
... Tom Clegg
04:07 pm Bug #12446: [crunch2] crunch-dispatch-slurm monitoring too many containers gets 414 error
Peter Amstutz wrote:
> However, I'm not entirely sure why it is querying for specific containers rather than queryin...
Tom Clegg
03:42 pm Bug #12446: [crunch2] crunch-dispatch-slurm monitoring too many containers gets 414 error
"tracked" here refers to the container records being managed by crunch-dispatch-slurm and their corresponding on slur... Peter Amstutz
03:32 pm Bug #12446: [crunch2] crunch-dispatch-slurm monitoring too many containers gets 414 error
Can you expand on this? What does "tracked" mean in this context? What is being tracked and why? Tom Morris
02:08 pm Bug #12446 (Resolved): [crunch2] crunch-dispatch-slurm monitoring too many containers gets 414 error
It seems that the "tracked" list has gotten big enough that passing the list of UUIDs of being tracked is exceeding t... Peter Amstutz
03:16 pm Feature #12260: Healthcheck endpoint aggregator
Some comments/questions:
* File @sdk/go/arvados/config.go@
** Lines 53 & 63: Comments seem to be outdated naming ...
Lucas Di Pentima
01:33 pm Bug #12416 (Feedback): keepstore does not work with s3_volume on Ceph RadosGW
Tom Clegg

10/11/2017

09:05 pm Feature #12216 (In Progress): [keep-web] machine-readable file listings
Tom Clegg
08:00 pm Bug #9279: [Ops] Create an "arvados" provider for terraform
Switching to Arvados Project because there is no sensitive information and is a good thing to share Nico César
07:34 pm Task #12443 (Resolved): Review 12216-webdav-list
Tom Clegg
07:34 pm Task #12442 (New): Review
Lucas Di Pentima
07:33 pm Task #12441 (New): Review arv-mount trashed support
Peter Amstutz
07:33 pm Task #12440 (In Progress): Review 11453-federated-tokens
Tom Clegg
07:33 pm Task #12439 (New): Review
Tom Morris
07:33 pm Task #12438 (In Progress): Review 11220-manifest-fetch-error
Tom Clegg
06:24 pm Story #12125 (In Progress): Client support for deleting projects
Peter Amstutz
06:24 pm Bug #12183 (In Progress): [crunch-run] Handle symlinks with absolute paths into output directory
Peter Amstutz
05:14 pm Bug #12404: Parallel a-c-r runs interfere in Docker uploads
This would also greatly speed up the CWL test suite that we run on 4xphq, c97qk and 9tee4. Ward Vandewege
01:21 pm Feature #12430 (New): Crunch2 limit output collection to glob patterns
The current behavior for crunch-run is to upload all files in the output directory. This sometimes results in tempor... Peter Amstutz

10/10/2017

08:23 pm Task #12362 (In Progress): Review 12260-system-health
Tom Clegg
08:23 pm Feature #12260: Healthcheck endpoint aggregator
12260-system-health @ commit:a9497f8d2756104ba07d88d5c8c7b84790fd83f3
Known todos:
* Update package scripts to bu...
Tom Clegg
07:52 pm Task #12421 (Resolved): Review 12418-glob-empty-collection
Peter Amstutz
03:55 pm Task #12421 (In Progress): Review 12418-glob-empty-collection
Peter Amstutz
03:38 pm Task #12421 (Resolved): Review 12418-glob-empty-collection
Peter Amstutz
07:52 pm Task #12423 (Resolved): Review 12422-pin-ciso8601
Peter Amstutz
05:22 pm Task #12423 (In Progress): Review 12422-pin-ciso8601
Lucas Di Pentima
05:13 pm Task #12423 (Resolved): Review 12422-pin-ciso8601
Peter Amstutz
06:36 pm Task #12424 (New): Migration process to convert local user IDs to network cluster IDs
As a cluster sysadmin, when we have federated network IDs, I'd like to replace all my local user UUIDs with network i... Tom Morris
06:10 pm Bug #12418 (Resolved): [CWL] Crash with glob on empty collection
Applied in changeset arvados|commit:0bb435a47e427b12fa2351141a22a1ba1e28a49d. Peter Amstutz
05:47 pm Bug #12418: [CWL] Crash with glob on empty collection
This LGTM, local test run finished ok. Thanks. Lucas Di Pentima
03:45 pm Bug #12418: [CWL] Crash with glob on empty collection
12418-glob-empty-collection @ commit:250f1578314d1f4d053d3d9f65a3d5c33d1578af
Use "collection is not None" instead...
Peter Amstutz
01:16 pm Bug #12418: [CWL] Crash with glob on empty collection
This is also blocking another four production batches, VGX1991-VGX1994.
https://projects.veritasgenetics.com/issue...
Tom Morris
01:14 pm Bug #12418: [CWL] Crash with glob on empty collection
This is blocking me here: https://projects.veritasgenetics.com/issues/2968
Also, I added the link to 12323, I post...
Bryan Cosca
05:35 pm Bug #12422 (Resolved): pin ciso8601 to avoid installation trouble on Python 3
Applied in changeset arvados|commit:455006efb88b2dbf7b489831d06afb850ac4e9aa. Peter Amstutz
05:31 pm Bug #12422: pin ciso8601 to avoid installation trouble on Python 3
LGTM, thanks! Lucas Di Pentima
05:14 pm Bug #12422: pin ciso8601 to avoid installation trouble on Python 3
We currently package ciso8601-1.0.3
Peter Amstutz
05:12 pm Bug #12422 (In Progress): pin ciso8601 to avoid installation trouble on Python 3
Peter Amstutz
05:12 pm Bug #12422: pin ciso8601 to avoid installation trouble on Python 3
12422-pin-ciso8601 @ commit:1a20228a176b0ebbb94650c5a5b1846e50c9eeb1
Peter Amstutz
05:10 pm Bug #12422 (Resolved): pin ciso8601 to avoid installation trouble on Python 3
https://github.com/closeio/ciso8601/issues/32
Near term solution is to pin @ciso8601 >=1.0.0, <=1.0.4@ until the p...
Peter Amstutz
03:43 pm Bug #12183: [crunch-run] Handle symlinks with absolute paths into output directory
I haven't had time to look closely enough but I'd like to make sure stuff like this works: ... Tom Clegg
02:02 pm Bug #12183: [crunch-run] Handle symlinks with absolute paths into output directory
Error with comment "tgt doesn't exist or lacks permissions" seems to be reported to user as "points to invalid locati... Tom Clegg

10/09/2017

09:29 pm Bug #12418 (In Progress): [CWL] Crash with glob on empty collection
This is believe to be the underlying cause of the failures in https://projects.veritasgenetics.com/issues/3775
Tom Morris
08:18 pm Bug #12418 (Resolved): [CWL] Crash with glob on empty collection
... Peter Amstutz

10/08/2017

12:18 pm Bug #12416: keepstore does not work with s3_volume on Ceph RadosGW
https://github.com/curoverse/arvados/pull/59
Joshua Randall
12:18 pm Bug #12416 (Feedback): keepstore does not work with s3_volume on Ceph RadosGW
keepstore does not include a `Content-Length: 0` header when attempting to create empty objects using the S3 API.
...
Joshua Randall

10/06/2017

07:53 pm Story #12125: Client support for deleting projects
Strategy
* When listing/searching trashed collections
** use include_trashed
** collect owner uuids and query th...
Peter Amstutz
01:44 pm Story #12125: Client support for deleting projects
Lucas Di Pentima wrote:
> * On the 404 page, do you think that adding the date when the trashed item is going to be ...
Peter Amstutz
07:49 pm Story #12414 (New): [API] Contents of trashed projects are actually deleted once delete_at is past
Peter Amstutz

10/05/2017

10:05 pm Bug #12410: Different help listing for 'arvados-cwl-runner'
I'm guessing that one is the help listing from cwltool (which is used internally) and the other is the help from arva... Tom Morris
10:03 pm Bug #12410 (New): Different help listing for 'arvados-cwl-runner'
@arvados-cwl-runner@ has a different help listings depending on whether @arvados-cwl-runner@ or @arvados-cwl-runner -... Abram Connelly
06:57 pm Story #12409 (New): [cwl] cwl-tool conditional implementation
Implement spec changes & cwltool support for workflow conditionals based on one of the designs described here:
htt...
Tom Morris
06:56 pm Task #12408 (New): add StorageClasses to keepstore volume configs
Tom Clegg
06:21 pm Task #12407 (New): [API] Collections get a new storage category class
Tom Morris
06:19 pm Task #12406 (New): [Keep-balance] knows which volumes offer which storage categories
Tom Morris
06:09 pm Feature #11184: [Keep] Support multiple storage tiers
https://dev.arvados.org/projects/arvados/wiki/Keep_storage_groups
Peter Amstutz
03:21 pm Bug #12404 (New): Parallel a-c-r runs interfere in Docker uploads
Copied from https://dev.arvados.org/issues/12355#note-9
If I give cwltest the -j=8 parameter (for instance) to run...
Peter Amstutz
03:20 pm Bug #12355 (Resolved): run-arvados-cwl-conformance-tests really slow
Applied in changeset arvados|commit:3910be3219ace18462a128571bfac0b35446a392. Anonymous
03:12 pm Bug #12355: run-arvados-cwl-conformance-tests really slow
Ward Vandewege wrote:
> Peter Amstutz wrote:
> > [...]
> >
> > Instead of changing the @test_with_arvbox.sh@ scr...
Peter Amstutz
03:08 pm Bug #12355: run-arvados-cwl-conformance-tests really slow
Peter Amstutz wrote:
> [...]
>
> Instead of changing the @test_with_arvbox.sh@ script, you can pass this in on th...
Ward Vandewege
03:05 pm Bug #12355: run-arvados-cwl-conformance-tests really slow
... Peter Amstutz
02:40 pm Bug #12183: [crunch-run] Handle symlinks with absolute paths into output directory
12183-crunch-run-symlinks commit:305490369b502d47607b7ffe790d2c85e9a8db34
Major refactor of the approach. Now the...
Peter Amstutz

10/04/2017

06:44 pm Bug #12355: run-arvados-cwl-conformance-tests really slow
ready for review: branch 12355-make-cwl-conformance-tests-faster at commit:832e319fe2ba70e3ae18410238f07281aa929af9 Ward Vandewege
05:13 pm Bug #12355: run-arvados-cwl-conformance-tests really slow
For future reference, this is an example of the runtime on the ci box as of commit:3623287cdc05121a86e573b89aebb6e4aa... Ward Vandewege
05:01 pm Bug #12355: run-arvados-cwl-conformance-tests really slow
Ward Vandewege wrote:
> Switching to using the containers API makes a huge improvement, it takes the runtime down to...
Peter Amstutz
04:52 pm Bug #12355: run-arvados-cwl-conformance-tests really slow
Switching to using the containers API makes a huge improvement, it takes the runtime down to around 36 minutes, even ... Ward Vandewege
06:29 pm Feature #12400 (New): arvados.Collection class should be able to calculate collection size
from arvados/sdk/python/arvados/commands/put.py
def _collection_size(self, collection):
"""
...
Bryan Cosca
05:12 pm Story #12125: Client support for deleting projects
* On the 404 page, do you think that adding the date when the trashed item is going to be deleted is something that c... Lucas Di Pentima
12:26 pm Story #12125: Client support for deleting projects
12125-workbench-project-trash @ commit:71da9f42b396e6ae8d7ef83b1855d5bb407c2a17
Adds "Trashed projects" tab to tra...
Peter Amstutz
03:27 pm Bug #12183: [crunch-run] Handle symlinks with absolute paths into output directory
Tom Clegg wrote:
> This error should probably mention the path of the symlink that failed:
>
> [...]
>
> It lo...
Peter Amstutz
02:07 pm Bug #12183: [crunch-run] Handle symlinks with absolute paths into output directory
This error should probably mention the path of the symlink that failed:... Tom Clegg
12:27 pm Task #12266 (In Progress): Review 12125-workbench-project-trash
Peter Amstutz
12:27 pm Task #12091 (Resolved): [Workbench] Add Projects tab to trash page
Peter Amstutz

10/03/2017

07:21 pm Bug #12390 (New): [OPS][keepstore][puppet] migrate all keepstores configuration to systemd unit +...
acceptance criteria
* create config file for keepstore (default location: /etc/arvados/keepstore/keepstore.yml mak...
Nico César
03:00 pm Feature #8333 (Resolved): [SDKs] `arv keep docker` supports `repo:tag` image name scheme
Applied in changeset arvados|commit:4b4a0917a967c0ec2dd7b72c9665e0859022f120. Anonymous
01:48 pm Feature #8333: [SDKs] `arv keep docker` supports `repo:tag` image name scheme
8333-docker-repo-tag LGTM Peter Amstutz
01:17 pm Feature #8333: [SDKs] `arv keep docker` supports `repo:tag` image name scheme
Updated & added tests for "host:port/repo/img:tag", "[::1]:port/repo/img", etc.
8333-docker-repo-tag @ commit:5832...
Tom Clegg
01:48 pm Task #12363 (Resolved): Review 8333-docker-repo-tag
Peter Amstutz

10/02/2017

08:18 pm Bug #12355: run-arvados-cwl-conformance-tests really slow
I instrumented a copy of arvbox to print timestamps, and then parsed those out. This is the result (in seconds). It's... Ward Vandewege
08:08 pm Task #12361 (Resolved): Review 12273-skip-special-files
Peter Amstutz
08:06 pm Bug #12183: [crunch-run] Handle symlinks with absolute paths into output directory
Tom Clegg wrote:
> In EvalSymlinks() "var tgt string" can move down a bit, to just before first use
Fixed.

> ...
Peter Amstutz
05:42 pm Story #12383: [Nodemanager] Explicit node record states
checkout commit:dc060ea2f05e3266562c449fff39b3e867041f84
we have .dot files to play with. I added some pngs attach...
Nico César
04:39 pm Story #12383 (New): [Nodemanager] Explicit node record states
Proposed node record states
* Requested - create request for node size X will be sent
* Assigned - create reque...
Peter Amstutz
05:15 pm Bug #5267 (Resolved): [Node Manager] add one integration test for node manager
Node manager has an integration test framework now. Peter Amstutz
02:34 pm Feature #8333: [SDKs] `arv keep docker` supports `repo:tag` image name scheme
So it turns out that you can specify a port when using a custom registry, e.g. "myregistry.io:8888/repo/image:tag"
...
Peter Amstutz
02:30 pm Bug #12073 (Resolved): [Node manager] Clean up stale arvados node records
Applied in changeset arvados|commit:1339298d5812df668e08d9e77d595012cffd3171. Anonymous
02:11 pm Bug #12073: [Node manager] Clean up stale arvados node records
Lucas Di Pentima wrote:
> Updates at commit:96da34b18 - new branch: @12073-nodemanager-stale-nodes-recs-bis@
> Test...
Peter Amstutz
02:11 pm Task #12201 (Resolved): Review 12073-nodemanager-stale-nodes-recs
Peter Amstutz

10/01/2017

11:35 pm Bug #12355: run-arvados-cwl-conformance-tests really slow
Here's an example of a particularly slow test job (this doesn't happen to *all* the tests):... Ward Vandewege
11:26 pm Bug #12355 (In Progress): run-arvados-cwl-conformance-tests really slow
Ward Vandewege
11:26 pm Bug #12355: run-arvados-cwl-conformance-tests really slow

We run these tests in arvbox in localdemo mode, which generates lots of docker volumes which it leaves lying around...
Ward Vandewege

09/30/2017

03:21 am Bug #12073: [Node manager] Clean up stale arvados node records
Updates at commit:96da34b18 - new branch: @12073-nodemanager-stale-nodes-recs-bis@
Test run: https://ci.curoverse.co...
Lucas Di Pentima

09/29/2017

09:42 pm Bug #12347: [CWL] reuse hints appear to be broken
Update - Tom suggested I move the noreuse hint to the top level, which does indeed work now:... Ward Vandewege
08:23 pm Bug #12347: [CWL] reuse hints appear to be broken
Hrm, this appears to still be broken:... Ward Vandewege
01:05 pm Bug #12347 (Resolved): [CWL] reuse hints appear to be broken
Applied in changeset arvados|commit:f7a197bc9f1d416c37467feb01ff3b87c323e2b2. Anonymous
12:49 pm Bug #12347: [CWL] reuse hints appear to be broken
12347-obey-wf-reuse-hint @ commit:6fa8ea28132c59c75f3356ecc62a7d4fdef0d5e0 LGTM Peter Amstutz
02:27 am Bug #12347: [CWL] reuse hints appear to be broken
Thanks!
12347-obey-wf-reuse-hint @ commit:6fa8ea28132c59c75f3356ecc62a7d4fdef0d5e0
Tom Clegg
09:10 pm Bug #12183: [crunch-run] Handle symlinks with absolute paths into output directory
In EvalSymlinks() "var tgt string" can move down a bit, to just before first use
Is "for _, ent := range ReadDir(p...
Tom Clegg
08:09 pm Bug #12183: [crunch-run] Handle symlinks with absolute paths into output directory
12183-crunch-run-symlinks @ commit:3afc0c377d2859d0aba622d1883af771b1b62594 Peter Amstutz
07:24 pm Task #12205 (In Progress): Review 12183-crunch-run-symlinks
Peter Amstutz
07:24 pm Task #12312 (Resolved): Fix
Peter Amstutz
06:30 pm Bug #12273 (Resolved): [crunch] Should not try to upload special files
Applied in changeset arvados|commit:9e58fc08fa949a6468626c001cd289f88e75fa15. Anonymous
01:48 pm Bug #12273: [crunch] Should not try to upload special files
This LGTM, please merge. Thanks! Lucas Di Pentima
06:22 pm Task #12363 (In Progress): Review 8333-docker-repo-tag
Tom Clegg
06:21 pm Feature #8333: [SDKs] `arv keep docker` supports `repo:tag` image name scheme
8333-docker-repo-tag @ commit:36e23b3761e43231789df66dc441727c000a2ffc
Tom Clegg
05:22 pm Feature #8333 (In Progress): [SDKs] `arv keep docker` supports `repo:tag` image name scheme
Tom Clegg
12:50 pm Task #12373 (Resolved): Review 12347-obey-wf-reuse-hint
Peter Amstutz
02:27 am Task #12373 (In Progress): Review 12347-obey-wf-reuse-hint
Tom Clegg
02:27 am Task #12373 (Resolved): Review 12347-obey-wf-reuse-hint
Tom Clegg
02:28 am Feature #12260 (In Progress): Healthcheck endpoint aggregator
Tom Clegg
02:27 am Task #12365 (Resolved): Review 12347-disable-reuse
Tom Clegg

09/28/2017

08:53 pm Task #12361 (In Progress): Review 12273-skip-special-files
Tom Clegg
08:52 pm Bug #12273: [crunch] Should not try to upload special files
12273-skip-special-files @ commit:97aae1ab26418204078599a5ddbff493d26e32d8
Also fixed #11583 -- at least, fixed en...
Tom Clegg
02:32 pm Bug #12273 (In Progress): [crunch] Should not try to upload special files
Tom Clegg
07:32 pm Task #12312 (In Progress): Fix
Peter Amstutz
03:57 pm Bug #12347: [CWL] reuse hints appear to be broken
Try this:... Peter Amstutz
03:42 pm Bug #12347: [CWL] reuse hints appear to be broken
12347-disable-reuse @ commit:4febac9dd03bc4be3cf59827bfc4c8f5bcbe1a57 LGTM Peter Amstutz
02:31 pm Bug #12347 (In Progress): [CWL] reuse hints appear to be broken
Tom Clegg
03:16 am Bug #12347: [CWL] reuse hints appear to be broken
12347-disable-reuse @ commit:4febac9dd03bc4be3cf59827bfc4c8f5bcbe1a57
* When ...
Tom Clegg
03:19 pm Feature #12371 (New): Admins should be able to see node status from Azure on workbench
For example, if jobs are queued for 20+ minutes, there should be an easy way on workbench to see the status of the no... Bryan Cosca
02:32 pm Task #12365 (In Progress): Review 12347-disable-reuse
Tom Clegg

09/27/2017

09:03 pm Bug #12369 (New): Update documentation to reflect split of FUSE driver into its own package
At least the arv-mount tutorial needs updating: https://doc.arvados.org/user/tutorials/tutorial-keep-mount.html
but ...
Tom Morris
07:41 pm Bug #12246: [Crunch] Better crunch-run error when command not found
12246-better-advice:... Peter Amstutz
05:05 pm Bug #12246 (Resolved): [Crunch] Better crunch-run error when command not found
Applied in changeset arvados|commit:91143ef549e065ebdfb0138a031fc1fbd65cb527. Anonymous
04:49 pm Bug #12246: [Crunch] Better crunch-run error when command not found
It runs together on a very long line, which makes it hard to read. Could the "advice" come after the error message o... Peter Amstutz
04:10 pm Bug #12246: [Crunch] Better crunch-run error when command not found
Just to clarify about the runc panic stack trace: it seems the stack trace is not a crash, it's just a verbose error ... Tom Clegg
03:39 pm Bug #12246: [Crunch] Better crunch-run error when command not found
I'm still not going to take apart $PATH and transform paths and symlinks to figure out what exec() would do in the co... Tom Clegg
02:54 pm Bug #12246: [Crunch] Better crunch-run error when command not found
The panic was a runc bug, fixed here: https://github.com/opencontainers/runc/pull/1117
Inside runc the panic was t...
Tom Clegg
07:26 pm Task #12367 (New): Review
Lucas Di Pentima
07:26 pm Task #12365 (Resolved): Review 12347-disable-reuse
Tom Clegg
07:26 pm Task #12363 (Resolved): Review 8333-docker-repo-tag
Tom Clegg
07:26 pm Task #12362 (Resolved): Review 12260-system-health
Tom Clegg
07:26 pm Task #12361 (Resolved): Review 12273-skip-special-files
Tom Clegg
07:25 pm Bug #12360 (New): Document how to add EBS tmp disk to nodemanager configuration
Tom Morris
07:22 pm Bug #12358 (New): Document requirements for nodemanager token scope
Tom Morris
07:02 pm Bug #12295: [nodemanager] Only looking at first 100 queued jobs
The queue is ordered this way by the API server:... Peter Amstutz
06:58 pm Bug #12347: [CWL] reuse hints appear to be broken
Oh, it also needs to look at the toplevel requirement on the workflow. Good point. Peter Amstutz
04:38 pm Bug #12347: [CWL] reuse hints appear to be broken
Tom Clegg wrote:
> Reading source:sdk/cwl/arvados_cwl/arvcontainer.py it looks like RunnerContainer.arvados_job_spec...
Ward Vandewege
03:22 pm Bug #12347: [CWL] reuse hints appear to be broken
Reading source:sdk/cwl/arvados_cwl/arvcontainer.py it looks like RunnerContainer.arvados_job_spec() fails to set "use... Tom Clegg
03:10 pm Bug #12347 (Resolved): [CWL] reuse hints appear to be broken
Using
sdk/cwl/tests/noreuse.cwl
from the arvados tree, I've tested on 9tee4 (head, crunchv2), tb05z (0.1.2017...
Ward Vandewege
06:44 pm Bug #12355 (Resolved): run-arvados-cwl-conformance-tests really slow
It is taking hours to complete. For some reason the "sudo" command to run crunch-job is taking 75 seconds to start.
...
Peter Amstutz
06:35 pm Task #12322 (Resolved): Review 12316-fix-provenance-graph
Tom Morris
06:35 pm Task #12265 (Resolved): Review 12246-command-not-found
Tom Morris
03:41 pm Task #12265 (In Progress): Review 12246-command-not-found
Tom Clegg
06:34 pm Story #12032 (Resolved): [API] Allow projects to be deleted (ie placed in the trash can)
Ward Vandewege
04:46 pm Story #12032 (In Progress): [API] Allow projects to be deleted (ie placed in the trash can)
Ward Vandewege
03:18 pm Story #12032 (Resolved): [API] Allow projects to be deleted (ie placed in the trash can)
Peter Amstutz
04:17 pm Bug #12298 (Resolved): [Crunch2] Invalid container output_path causes infinite loop of futile dis...
Moved the "ideally" part of the proposed fix to #12349 Tom Clegg
04:15 pm Bug #12349 (New): [API] Validate container requests "output_path must be in a writable mount"
If output_path is not in a writable mount, crunch-run will cancel the container with a suitable error message (see #1... Tom Clegg
03:20 pm Task #12091 (In Progress): [Workbench] Add Projects tab to trash page
Peter Amstutz
01:55 pm Feature #12345 (New): [CWL] Use arv-put collection caching for file uploads
arvados-cwl-runner uses arvados.commands.run.uploadfiles() to handle file uploads. However, this uploads all files e... Peter Amstutz
01:27 pm Feature #12018: Synchronize group membership with external data source
How about making a single parent group, and using that as the owner_uuid of all synchronized groups?
* If --parent-g...
Tom Clegg
04:05 am Feature #12018 (In Progress): Synchronize group membership with external data source
Lucas Di Pentima
04:05 am Feature #12018: Synchronize group membership with external data source
Updates at commit:fd14dc21b - branch @12018-sync-groups-tool@ (WIP)
Added a first version of the new command @arv-...
Lucas Di Pentima
01:19 pm Bug #12199: Don't schedule jobs on nodes which are bigger than requested
# When dispatching, need to prevent jobs from being scheduled on too-big nodes. We can use the sbatch option --extra... Peter Amstutz

09/26/2017

08:03 pm Bug #11519: arv-get should abort on ctrl/C
It sounds like we need to do something like this in order to detect ^C properly. Peter Amstutz
07:55 pm Story #12289: Migrate to Slurm API
Potential benefits include better compatibility across slurm versions (less CLI-output-parsing code) and lower overhe... Tom Clegg
07:23 pm Bug #12199: Don't schedule jobs on nodes which are bigger than requested
The definition of "bigger than requested" here should take into account that
* container requirements specify what t...
Tom Clegg
06:58 pm Task #12336 (New): Investigate Slurm API
A research spike to figure out whether the Slurm API is better than our current approach. Tom Morris
06:49 pm Story #12239: Allow templating of the collection sharing web page
Perhaps a solution to this would be to include a customized HTML index page in the collection and distributing the li... Tom Morris
06:09 pm Bug #12332 (Duplicate): [OPS] puppet messages on ci.curoverse.com about gem issues
It all seems to be the same issue as #12217. Closing this one. Javier Bértoli
05:28 pm Bug #12332 (Duplicate): [OPS] puppet messages on ci.curoverse.com about gem issues
When running puppet on ci.curoverse.com, it throws a lot of notices like... Javier Bértoli
05:27 pm Bug #12331 (New): [CWL] Does not distinguish between keep references to File and Directory when s...
Peter Amstutz
03:16 pm Task #12200 (Resolved): Review 11068-cwl-missing-docker
Tom Morris
04:23 am Bug #12316: [Workbench] deletion of data collections destroying the provenance graph - rendering ...
Thanks for the quick fix! 6 hours from report to fix and <12 hours to merge. Don't forget to prepare to demo at sprin... Tom Morris
02:14 am Bug #12316 (Resolved): [Workbench] deletion of data collections destroying the provenance graph -...
Lucas Di Pentima wrote:
> This LGTM. Just one idea: would it be nice to also notify the user that the collection is ...
Ward Vandewege

09/25/2017

09:56 pm Bug #12316: [Workbench] deletion of data collections destroying the provenance graph - rendering ...
This LGTM. Just one idea: would it be nice to also notify the user that the collection is deleted to minimize the sur... Lucas Di Pentima
09:22 pm Bug #12316 (In Progress): [Workbench] deletion of data collections destroying the provenance grap...
Ward Vandewege
08:54 pm Bug #12316: [Workbench] deletion of data collections destroying the provenance graph - rendering ...
I have a fix for this bug in branch 12316-fix-provenance-graph at commit:5098820fed0920a31dd87a12c4f027318d7a1bd6 Ward Vandewege
03:03 pm Bug #12316 (Resolved): [Workbench] deletion of data collections destroying the provenance graph -...
Test case:
https://workbench.9tee4.arvadosapi.com/container_requests/9tee4-xvhdp-j0022f2t2b8w4ya#Provenance
T...
Ward Vandewege
09:05 pm Task #12322 (Resolved): Review 12316-fix-provenance-graph
Lucas Di Pentima
06:35 pm Bug #12073: [Node manager] Clean up stale arvados node records
Sorry, I had to cut the previous thought short.
Currently the "clean" behavior of SetupActor has the effect of tou...
Peter Amstutz
06:09 pm Bug #12073: [Node manager] Clean up stale arvados node records
Updates at commit:6ec328c00
Test run: https://ci.curoverse.com/job/developer-run-tests/468/
* Updated documentati...
Lucas Di Pentima
05:01 pm Bug #12073: [Node manager] Clean up stale arvados node records
On further thought, I realize there's a reason the SetupActor performs the record cleaning, which we need to maintain... Peter Amstutz
05:07 pm Task #12131 (Resolved): Review 12032-project-trash
Peter Amstutz
04:34 pm Feature #12320 (New): Access Arvados projects under the "by_id" directory in keep mount
One can access collections either by their portable data hash or UUID from the "by_id" directory in the keep mount. ... Abram Connelly
03:20 pm Bug #12318 (New): Honor Retry-After headers on libcloud exceptions
It seems to be a bug on libcloud that could make nodemanager behave erratically on certain error situations with clou... Lucas Di Pentima
03:15 pm Feature #12317: [FUSE] unable to rename a subproject

Renaming a subproject from the fuse driver (in read-write mode) does not work:...
Ward Vandewege
03:14 pm Feature #12317 (New): [FUSE] unable to rename a subproject
Ward Vandewege
03:02 pm Feature #12315 (New): [Workbench] on all processes page, add filter to make it possible to exclud...
Ward Vandewege
01:37 pm Feature #12314 (New): [FUSE] Incremental collection subdirectory load
Collections are currently managed by the Collection API in the Python SDK and FUSE as a single unit. This means to a... Peter Amstutz
01:25 pm Bug #11068 (Resolved): [Arvados-CWL-runner] need better error message when there are issues getti...
Applied in changeset arvados|commit:58c6f3aa42f4f30fc4a764ca56ab1a198754b69b. Peter Amstutz
01:23 pm Task #12312 (Resolved): Fix
Peter Amstutz

09/22/2017

06:29 pm Bug #12307 (In Progress): [OPS] cgroup_enable=memory swapaccount=1 grub parameters are not being set
Checking a compute node on e51c5, I see the issue is that we have *TWO* **GRUB_CMDLINE_LINUX_DEFAULT** entries. Our r... Javier Bértoli
06:10 pm Bug #12307 (In Progress): [OPS] cgroup_enable=memory swapaccount=1 grub parameters are not being set
#9431 suggests that both *compute* and *shell* nodes require these parameters set at boot time (in grub) (see also #7... Javier Bértoli
06:15 pm Story #12308 (New): [FUSE] Golang-based fuse driver
Background:
Python+llfuse was expedient and has done lots of good work for us, but it's not promising as a long te...
Tom Clegg
05:32 pm Bug #12306 (New): [arv-mount] --unmount should work on an unresponsive mount
Currently, if an arv-mount process is in some deadlocked/stuck state, running @arv-mount --unmount PATH@ just hangs i... Tom Clegg
02:51 pm Bug #11068: [Arvados-CWL-runner] need better error message when there are issues getting the dock...
This LGTM.
I think it would be nice to ask the cwltool project to enhance its error reporting when executing exter...
Lucas Di Pentima
02:06 pm Bug #12246: [Crunch] Better crunch-run error when command not found
Tom Clegg wrote:
> > So if container startup fails, we should make sure to report the command being invoked...
> ...
Peter Amstutz

09/21/2017

09:25 pm Bug #12304 (New): Different versions of ruamel.yaml required across SDK
https://github.com/curoverse/arvados/blob/master/sdk/python/setup.py
'ruamel.yaml >=0.13.7'
https://github.com/cu...
Jeff Jasper
06:06 pm Bug #11068: [Arvados-CWL-runner] need better error message when there are issues getting the dock...
Improved error handling/reporting: 11068-cwl-missing-docker Peter Amstutz
06:05 pm Task #12200 (In Progress): Review 11068-cwl-missing-docker
Peter Amstutz
02:09 pm Task #12299 (Resolved): Review 12298-cancel-fail
Peter Amstutz
02:09 pm Bug #12298: [Crunch2] Invalid container output_path causes infinite loop of futile dispatch attempts
12298-cancel-fail LGTM Peter Amstutz
 

Also available in: Atom