



Bug #16100


[keep-web] Avoid sniffing for content type when file extension matches a MIME type

Added by Tom Clegg about 5 years ago. Updated about 5 years ago.

Assigned To:
Target version:
Story points:
Release relationship:


Currently, when serving a GET request for a file, the WebDAV service uses the Go standard library's content sniffing feature to guess an appropriate Content-Type if the filename extension is not listed in /etc/mime.types or a small built-in list of extensions. This is unreliable (and not just hypothetically -- users have been surprised by mysteriously broken previews).

For example, if the /etc/mime.types file does not exist, a file called "bmx.txt" containing the text "BMX bikes are awesome.\n" is currently served with Content-Type: image/bmp because the first two bytes "BM" satisfy the signature for a BMP image file, and this causes it to render incorrectly in the browser.

To avoid this problem:

Keep-web OS packages should list the package providing /etc/mime.types -- "mailcap" on centos, "mime-support" on debian and ubuntu -- as a dependency.

At startup, keep-web should check the mime type for a common extension like .txt that's not in the built-in list, and log a warning if it's missing.

Subtasks 1 (0 open1 closed)

Task #16147: Review 16100-mime-typesResolvedTom Clegg02/14/2020Actions

Related issues 1 (0 open1 closed)

Related to Arvados - Idea #15348: [pam] PAM module in GoResolvedTom Clegg06/23/2020Actions
Actions #1

Updated by Michael Crusoe about 5 years ago

Tom Clegg wrote:

Observed behavior: A file called "bmx.txt" containing the text "BMX bikes are awesome.\n" is currently served with Content-Type: image/bmp because the first two bytes "BM" satisfy the signature for a BMP image file, and this causes it to render incorrectly in the browser.

FYI, the unix "file" command correctly identifies said file:

$ echo "BMX bikes are awesome.\n" > bmx.txt 
$ file --mime bmx.txt 
bmx.txt: text/plain; charset=us-ascii
$ file --version
Actions #2

Updated by Tom Clegg about 5 years ago

  • Status changed from New to In Progress
  • Description updated (diff)
Actions #4

Updated by Lucas Di Pentima about 5 years ago

Although Jenkins says it's all fine, I've ran the services/keep-web tests on my dev VMs (debian9 & debian10) and I'm getting a failure like this:

arv-git-httpd pid 11288 ok
ws pid 11304 ok
======= test services/keep-web
time="2020-02-14T17:52:07-03:00" level=error msg="stat.Size()==3 but only wrote 0 bytes; read(1024) returns 0, GET acbd18db4cc2f85cedef654fccc4a4d8+3+A2b8e58eafe2fb5db03583e062bd1aa7871103fa8@5e597d77 failed: [http://localhost:39073/acbd18db4cc2f85cedef654fccc4a4d8+3+A2b8e58eafe2fb5db03583e062bd1aa7871103fa8@5e597d77: Get http://localhost:39073/acbd18db4cc2f85cedef654fccc4a4d8+3+A2b8e58eafe2fb5db03583e062bd1aa7871103fa8@5e597d77: dial tcp [::1]:39073: connect: connection refused http://localhost:33737/acbd18db4cc2f85cedef654fccc4a4d8+3+A2b8e58eafe2fb5db03583e062bd1aa7871103fa8@5e597d77: Get http://localhost:33737/acbd18db4cc2f85cedef654fccc4a4d8+3+A2b8e58eafe2fb5db03583e062bd1aa7871103fa8@5e597d77: dial tcp [::1]:33737: connect: connection refused http://localhost:39073/acbd18db4cc2f85cedef654fccc4a4d8+3+A2b8e58eafe2fb5db03583e062bd1aa7871103fa8@5e597d77: Get http://localhost:39073/acbd18db4cc2f85cedef654fccc4a4d8+3+A2b8e58eafe2fb5db03583e062bd1aa7871103fa8@5e597d77: dial tcp [::1]:39073: connect: connection refused http://localhost:33737/acbd18db4cc2f85cedef654fccc4a4d8+3+A2b8e58eafe2fb5db03583e062bd1aa7871103fa8@5e597d77: Get http://localhost:33737/acbd18db4cc2f85cedef654fccc4a4d8+3+A2b8e58eafe2fb5db03583e062bd1aa7871103fa8@5e597d77: dial tcp [::1]:33737: connect: connection refused http://localhost:39073/acbd18db4cc2f85cedef654fccc4a4d8+3+A2b8e58eafe2fb5db03583e062bd1aa7871103fa8@5e597d77: Get http://localhost:39073/acbd18db4cc2f85cedef654fccc4a4d8+3+A2b8e58eafe2fb5db03583e062bd1aa7871103fa8@5e597d77: dial tcp [::1]:39073: connect: connection refused http://localhost:33737/acbd18db4cc2f85cedef654fccc4a4d8+3+A2b8e58eafe2fb5db03583e062bd1aa7871103fa8@5e597d77: Get http://localhost:33737/acbd18db4cc2f85cedef654fccc4a4d8+3+A2b8e58eafe2fb5db03583e062bd1aa7871103fa8@5e597d77: dial tcp [::1]:33737: connect: connection refused]" 
2020/02/14 17:52:09 authSettings: map[ARVADOS_API_HOST: ARVADOS_API_HOST_INSECURE:true ARVADOS_API_TOKEN:4axaw8zxe0qm22wa6urpp5nskcne8z88cvbupv653y1njyi05h] is 11450 is 11462

Thought that I might have some local problem, but current master runs OK.

Actions #5

Updated by Tom Clegg about 5 years ago

Here, I see that error in the logs but it doesn't cause a test failure. The test does a GET request for a file whose content can't be retrieved. The handler only fails after it's returned 200, and the test doesn't check the response body. Changing it to an integration test makes the logged error go away:

16100-mime-types @ 3836d53ef13841dad652e3faeb20660576279afd -- developer-run-tests: #1735

Actions #6

Updated by Lucas Di Pentima about 5 years ago

Thanks. Locally I was getting a test failure with errorlevel=29 when running the tests like this:

~/arvados/build/ WORKSPACE=~/arvados CONFIGSRC=~/arvados-test-config --temp ~/.cache/arvados-build --only services/keep-web --skip-install

Now this last fix makes the test pass, and correctly fail if I move the file /etc/mime.types to some other place, with the following message:

FAIL: handler_test.go:919: IntegrationSuite.TestFileContentType

time="2020-02-17T13:23:06.120200890-03:00" level=warning msg="SystemRootToken missing from cluster config, falling back to ARVADOS_API_TOKEN environment variable" 
time="2020-02-17T13:23:06.120236123-03:00" level=warning msg="Services.Controller.ExternalURL missing from cluster config, falling back to ARVADOS_API_HOST(_INSECURE) environment variables" 
    c.Check(resp.Header().Get("Content-Type"), check.Equals, trial.contentType)
... obtained string = "image/bmp" 
... expected string = "text/plain; charset=utf-8" 

    c.Check(resp.Header().Get("Content-Type"), check.Equals, trial.contentType)
... obtained string = "image/bmp" 
... expected string = "image/x-ms-bmp" 

So this LGTM, thanks!!

Actions #7

Updated by Anonymous about 5 years ago

  • % Done changed from 0 to 100
  • Status changed from In Progress to Resolved
Actions #8

Updated by Ward Vandewege over 4 years ago


Also available in: Atom PDF