Feature #5375

[workbench] Enable in-browser preview for text files in collections

Added by Nancy Ouyang about 4 years ago. Updated almost 4 years ago.

Status:
New
Priority:
Normal
Assigned To:
-
Category:
Workbench
Target version:
Start date:
03/18/2015
Due date:
% Done:

100%

Estimated time:
(Total: 0.00 h)
Story points:
0.5

Description

The magnifying class should be enabled --
Just displaying the first 100kb is fine.

https://github.com/github/linguist/blob/master/lib/linguist/languages.yml

I suspect a subset will do initially, like
py, R, sh, xsl,
md, yml
txt, json, xml, tsv, csv
fa, fasta, sam, (not bam), vcf

Most specifically, I want support for *.fa and *.fasta

https://workbench.qr1hi.arvadosapi.com/collections/qr1hi-4zz18-7zk4muy5grnaqpv


Subtasks

Task #5497: Review branch: 5375-preview-collection-text-filesResolvedRadhika Chippada


Related issues

Related to Arvados - Feature #5511: [Workbench] Workbench should support viewing VCF filesNew2015-03-19

Associated revisions

Revision 10d3f91d
Added by Radhika Chippada about 4 years ago

closes #5375
Merge branch '5375-preview-collection-text-files'

Revision a51029fa
Added by Radhika Chippada about 4 years ago

refs #5375
Merge branch '5375-preview-collection-text-files'

History

#1 Updated by Radhika Chippada about 4 years ago

  • Category set to Workbench
  • Target version set to Bug Triage

#2 Updated by Tom Clegg about 4 years ago

  • Tracker changed from Bug to Feature
  • Story points set to 0.5

#3 Updated by Tom Clegg about 4 years ago

  • Target version changed from Bug Triage to 2015-04-01 sprint

#4 Updated by Radhika Chippada about 4 years ago

  • Assigned To set to Radhika Chippada

#5 Updated by Tom Clegg about 4 years ago

+1 Brett's suggestion of looking up MIME type and considering a file "viewable" if
  • its content type is "text/*" or "image/*"
  • its content type is "application/X" where X is something we whitelist, like pdf

#6 Updated by Radhika Chippada about 4 years ago

Notes about implementation:

  • The gem mime_types supports 1868 mime types. The data file is within the gem. It is currently not supporting just 5 of the types listed by Nancy. These are: fa, fasta, go, r, sam

    https://github.com/halostatue/mime-types

  • Added these unsupported types to mime_types using a rails initializer. These are added as "application" types.
  • Added a helper utility that gets the mime type using MIME.
    • If the file is text or image per the MIME type, true is returned.
    • On the other hand, if it is an application type, true is returned if it is white listed using the config parameter "filename_suffixes_with_view_icon"
  • Updated collections/_show_files.html to use the helper utility
  • Added tests to verify that each of the type listed in the story description is viewable.

#7 Updated by Radhika Chippada about 4 years ago

  • Status changed from New to In Progress

#8 Updated by Brett Smith about 4 years ago

Reviewing a9c8f2d

There should be a one-to-one mapping between file formats, and MIME types. Several different file extensions may map to the same MIME type because they refer to the same format (e.g., .tgz and .tar.gz)—and that's good, because it means that we can more easily identify files without sweating the details of the extension. This feature is why it would be helpful to rely on MIME type information exclusively to identify files, and get away from any extension parsing and recognition code.

Along those lines:

  • According to Wikipedia, the extensions .fasta, .fas, .fa, .seq, and .fsa can all refer to FASTA files. All of these should be registered as the same MIME type—probably application/fasta, as in your first registration.
  • It would be good to replace the configuration setting filename_suffixes_with_view_icon with something like application_mimetypes_with_view_icon. This would be a list of MIME subtypes under application/ that are viewable in most browsers. This way, we can just list FASTA files once with a name like fasta rather than listing fa and fasta (and the other extensions) separately. It also makes it easier for us to handle file names that have more than one extension.

Other comments:

  • With run-tests.sh, I got the following test failures:
      2) Failure:
    ActionsControllerTest#test_combine_files_into_new_collection [/home/brett/repos/arvados/apps/workbench/test/controllers/actions_controller_test.rb:39]:
    Not found 0:0:file1 0:0:file2 0:0:file3 in new collection manifest text
    
      3) Failure:
    ActionsControllerTest#test_combine_files_with_repeated_names_into_new_collection [/home/brett/repos/arvados/apps/workbench/test/controllers/actions_controller_test.rb:69]:
    Not found 0:0:file1 0:0:file2 0:0:file3 in new collection manifest text

    If I run just that test suite, I get a couple more:
      1) Failure:
    ActionsControllerTest#test_combine_collections_with_repeated_filenames_in_almost_similar_directories_and_expect_files_with_proper_suffixes [/home/brett/repos/arvados/apps/workbench/test/controllers/actions_controller_test.rb:100]:
    Not found: alice(1) in dir1 in manifest text ./dir1 92b53930db60fe94be2a73fc771ba921+34+A9b99704178af2818f0581590e6997dd99f7ac4e4@551d559a 0:12:alice 12:12:alice.txt 24:10:bob.txt
    ./dir2 56ac2557b1ded11ccab7293dc47d1e88+44+Af6a5e1e6eee64efa20cddbc7920bf2955db78792@551d559a 0:27:alice.txt
    ./dir1 92b53930db60fe94be2a73fc771ba921+34+A9b99704178af2818f0581590e6997dd99f7ac4e4@551d559a 0:12:alice(1) 12:12:alice(1).txt 24:10:carol.txt
    ./dir2 56ac2557b1ded11ccab7293dc47d1e88+44+Af6a5e1e6eee64efa20cddbc7920bf2955db78792@551d559a 0:27:alice(1).txt
    . acbd18db4cc2f85cedef654fccc4a4d8+3+A53f7e7c7f8d8b902474fa78ceb2b9def18b19487@551d559a 0:3:foo
    . acbd18db4cc2f85cedef654fccc4a4d8+3+A53f7e7c7f8d8b902474fa78ceb2b9def18b19487@551d559a 0:3:foo(1)
    
      4) Failure:
    ActionsControllerTest#test_combine_foo_files_from_two_different_collection_streams_and_expect_proper_filename_suffixes [/home/brett/repos/arvados/apps/workbench/test/controllers/actions_controller_test.rb:162]:
    Incorrect number of streams in . acbd18db4cc2f85cedef654fccc4a4d8+3+A41aa3b6158584ce0ceb94554d6f1faed7e0d6ff9@551d55a4 0:3:foo
    . a84b928ebdbae3f658518c711beaec02+28+Af25e2bad91a06afe446a297d255da0ce6ce4d19b@551d55a4 12:16:foo(1)
    .
    Expected: 1
      Actual: 2
  • Please make a method to register a new MIME type if it doesn't exist. I know the block is simple to copy and paste now, but I won't be surprised if we end up extending some of the conditional logic in the future. Making a method now will help with that.
  • Why does the initializer register the types with Mime::Type? According to the comment, that's for Rails' use in responding to specific formats, and I don't believe we want to recognize any of these formats in responses.
  • Why does the initializer include MIME?
  • Please make sure that the portable data hashes for the new Collection fixtures correspond to the manifest text. It helps make the fixtures more useful for other tests in the future if they already have good data. And it helps avoid questions about whether or not the portable data hash is "intentionally" wrong.

Thanks.

#9 Updated by Radhika Chippada about 4 years ago

Brett, thanks for the review comments. I addressed all the comments. (I was being lazy about the new fixture pdf; now that I updated it, there is now no conflict with the other action_controller tests.)

As for "include MIME", without it workbench does not find MIME and I cannot access any pages. I think it is got to do with the order of loading gems during initialization.

Thanks.

#10 Updated by Brett Smith about 4 years ago

At 4a8dc95, I just have one organizational thought. Following Rails convention, it would be ideal to test CollectionsHelper in a dedicated test case under tests/helpers/, rather than mixing the tests with the collections unit tests. (I apologize for not noting this last time.) This would make it easier to find and extend the tests when doing future work on the helper.

Everything else, including the tests themselves, looks great. Thanks.

#11 Updated by Radhika Chippada about 4 years ago

  • Status changed from In Progress to Resolved
  • % Done changed from 0 to 100

Applied in changeset arvados|commit:10d3f91d4b5619172dd5d8fbf3e8aa863b673e2f.

#12 Updated by Radhika Chippada about 4 years ago

  • Status changed from Resolved to In Progress

Also added javascript and rtf application types to the list of white-listed applications.

#13 Updated by Radhika Chippada about 4 years ago

  • Status changed from In Progress to Resolved

#14 Updated by Nancy Ouyang almost 4 years ago

Looks great! I guess I wasn't very clear, but I'd find it very helpful for Workbench to display a few lines from the file inside the browser, regardless of how each of my browsers I have is is set up to handle the MIME type. I definitely don't have an in-browser applications setup to preview fasta files, so right now the "magnifying glass" button does the same thing as the "download button" for me -- it pops up a "download" dialog in my browser. I'd rather have a little preview of the text in the file, sort of how log files are displayed, or even just a raw monospaced text dump of the first 20 kb in a new tab is fine and super helpful.

#15 Updated by Brett Smith almost 4 years ago

  • Subject changed from [workbench] enable in-browser preview for text files in collections to [workbench] Enable in-browser preview for text files in collections
  • Status changed from Resolved to New
  • Assigned To deleted (Radhika Chippada)
  • Target version changed from 2015-04-01 sprint to Arvados Future Sprints

Reopening to address the above. See also #5511, where a user requested the ability to preview VCF files.

Also available in: Atom PDF