Project

General

Profile

Actions

Bug #8873

open

[Docs] file_names Collection field is undocumented

Added by Joshua Randall about 8 years ago. Updated about 2 months ago.

Status:
New
Priority:
Normal
Assigned To:
-
Category:
Documentation
Target version:
Story points:
-
Release:
Release relationship:
Auto

Description

I'd like to list collections containing files that match a certain pattern. I thought, though expensive, it should be possible to do this by filtering on manifest_text.

Unfortunately this gives an error:

$ arv collection list -f '[["manifest_text","like","%._cfb9b6873.%vcf.gz"]]' -s '["uuid"]'
Error: #<ArgumentError: Invalid attribute 'manifest_text' in filter>

Actions #1

Updated by Brett Smith about 8 years ago

Josh,

manifest_text was intentionally made unsearchable in #4523, because it's too big to index in the database. For use cases like yours, we provide a file_names attribute that simply lists the filenames in the manifest, a long string with one filename per line. You should be able to do something like:

arv collection list -f '[["file_names","like","%._cfb9b6873.%vcf.gz\n%"]]' -s '["uuid"]'

Does that meet your needs?

I see that this attribute isn't in our API documentation at all, so if nothing else, this bug can tell us to fix that.

Actions #2

Updated by Joshua Randall about 8 years ago

  • Category changed from API to Documentation

Thanks - the undocumented searchable "file_names" attribute meets my needs in this case. The "fix" for this issue would be to document it. It could also be good if it was part of the returned collection record, so that reading the out-of-band documentation is not required.

Are these file_names the filename only, or the full "path" (i.e. stream name and file name) to the file?

If they were actually path_names, allowing them to be selected as an output would make it possible to grab a collection's directory listing from the API server without having to parse the manifest.

Actions #3

Updated by Brett Smith about 8 years ago

Joshua Randall wrote:

Are these file_names the filename only, or the full "path" (i.e. stream name and file name) to the file?

Filename only. The column is also size-limited (to ensure it can be indexed), so it's not guaranteed to be a complete listing. I think this is part of the rationale for not returning it in individual GET requests: code that relied on it would mishandle large collections. Better to parse the manifest, which you'll always have.

Actions #4

Updated by Brett Smith about 8 years ago

  • Subject changed from Can't filter collections on their contents to [Docs] file_names Collection field is undocumented
Actions #5

Updated by Brett Smith about 8 years ago

  • Target version set to Arvados Future Sprints
Actions #6

Updated by Ward Vandewege almost 3 years ago

  • Target version deleted (Arvados Future Sprints)
Actions #7

Updated by Peter Amstutz about 1 year ago

  • Release set to 60
Actions #8

Updated by Peter Amstutz about 2 months ago

  • Target version set to Future
Actions

Also available in: Atom PDF