Project

General

Profile

Actions

Bug #16100

closed

[keep-web] Avoid sniffing for content type when file extension matches a MIME type

Added by Tom Clegg almost 5 years ago. Updated almost 5 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
Keep
Target version:
Story points:
0.5
Release relationship:
Auto

Description

Currently, when serving a GET request for a file, the WebDAV service uses the Go standard library's content sniffing feature to guess an appropriate Content-Type if the filename extension is not listed in /etc/mime.types or a small built-in list of extensions. This is unreliable (and not just hypothetically -- users have been surprised by mysteriously broken previews).

For example, if the /etc/mime.types file does not exist, a file called "bmx.txt" containing the text "BMX bikes are awesome.\n" is currently served with Content-Type: image/bmp because the first two bytes "BM" satisfy the signature for a BMP image file, and this causes it to render incorrectly in the browser.

To avoid this problem:

Keep-web OS packages should list the package providing /etc/mime.types -- "mailcap" on centos, "mime-support" on debian and ubuntu -- as a dependency.

At startup, keep-web should check the mime type for a common extension like .txt that's not in the built-in list, and log a warning if it's missing.


Subtasks 1 (0 open1 closed)

Task #16147: Review 16100-mime-typesResolvedTom Clegg02/14/2020Actions

Related issues 1 (0 open1 closed)

Related to Arvados - Idea #15348: [pam] PAM module in GoResolvedTom Clegg06/23/2020Actions
Actions #1

Updated by Michael Crusoe almost 5 years ago

Tom Clegg wrote:

Observed behavior: A file called "bmx.txt" containing the text "BMX bikes are awesome.\n" is currently served with Content-Type: image/bmp because the first two bytes "BM" satisfy the signature for a BMP image file, and this causes it to render incorrectly in the browser.

FYI, the unix "file" command correctly identifies said file:

$ echo "BMX bikes are awesome.\n" > bmx.txt 
$ file --mime bmx.txt 
bmx.txt: text/plain; charset=us-ascii
$ file --version
file-5.37
Actions #2

Updated by Tom Clegg almost 5 years ago

  • Status changed from New to In Progress
  • Description updated (diff)
Actions #4

Updated by Lucas Di Pentima almost 5 years ago

Although Jenkins says it's all fine, I've ran the services/keep-web tests on my dev VMs (debian9 & debian10) and I'm getting a failure like this:

[...]
{"health":"OK"}
arv-git-httpd pid 11288 ok
{"health":"OK"}
{"health":"OK"}
ws pid 11304 ok
ARVADOS_TEST_PROXY_SERVICES=1
ARVADOS_API_TOKEN=4axaw8zxe0qm22wa6urpp5nskcne8z88cvbupv653y1njyi05h
ARVADOS_CONFIG=/media/psf/arvados/tmp/arvados.yml
ARVADOS_API_HOST=0.0.0.0:45751
ARVADOS_TEST_API_INSTALLED=10501
ARVADOS_TEST_API_HOST=0.0.0.0:54431
ARVADOS_API_HOST_INSECURE=true
======= test services/keep-web
time="2020-02-14T17:52:07-03:00" level=error msg="stat.Size()==3 but only wrote 0 bytes; read(1024) returns 0, GET acbd18db4cc2f85cedef654fccc4a4d8+3+A2b8e58eafe2fb5db03583e062bd1aa7871103fa8@5e597d77 failed: [http://localhost:39073/acbd18db4cc2f85cedef654fccc4a4d8+3+A2b8e58eafe2fb5db03583e062bd1aa7871103fa8@5e597d77: Get http://localhost:39073/acbd18db4cc2f85cedef654fccc4a4d8+3+A2b8e58eafe2fb5db03583e062bd1aa7871103fa8@5e597d77: dial tcp [::1]:39073: connect: connection refused http://localhost:33737/acbd18db4cc2f85cedef654fccc4a4d8+3+A2b8e58eafe2fb5db03583e062bd1aa7871103fa8@5e597d77: Get http://localhost:33737/acbd18db4cc2f85cedef654fccc4a4d8+3+A2b8e58eafe2fb5db03583e062bd1aa7871103fa8@5e597d77: dial tcp [::1]:33737: connect: connection refused http://localhost:39073/acbd18db4cc2f85cedef654fccc4a4d8+3+A2b8e58eafe2fb5db03583e062bd1aa7871103fa8@5e597d77: Get http://localhost:39073/acbd18db4cc2f85cedef654fccc4a4d8+3+A2b8e58eafe2fb5db03583e062bd1aa7871103fa8@5e597d77: dial tcp [::1]:39073: connect: connection refused http://localhost:33737/acbd18db4cc2f85cedef654fccc4a4d8+3+A2b8e58eafe2fb5db03583e062bd1aa7871103fa8@5e597d77: Get http://localhost:33737/acbd18db4cc2f85cedef654fccc4a4d8+3+A2b8e58eafe2fb5db03583e062bd1aa7871103fa8@5e597d77: dial tcp [::1]:33737: connect: connection refused http://localhost:39073/acbd18db4cc2f85cedef654fccc4a4d8+3+A2b8e58eafe2fb5db03583e062bd1aa7871103fa8@5e597d77: Get http://localhost:39073/acbd18db4cc2f85cedef654fccc4a4d8+3+A2b8e58eafe2fb5db03583e062bd1aa7871103fa8@5e597d77: dial tcp [::1]:39073: connect: connection refused http://localhost:33737/acbd18db4cc2f85cedef654fccc4a4d8+3+A2b8e58eafe2fb5db03583e062bd1aa7871103fa8@5e597d77: Get http://localhost:33737/acbd18db4cc2f85cedef654fccc4a4d8+3+A2b8e58eafe2fb5db03583e062bd1aa7871103fa8@5e597d77: dial tcp [::1]:33737: connect: connection refused]" 
2020/02/14 17:52:09 authSettings: map[ARVADOS_API_HOST:0.0.0.0:54431 ARVADOS_API_HOST_INSECURE:true ARVADOS_API_TOKEN:4axaw8zxe0qm22wa6urpp5nskcne8z88cvbupv653y1njyi05h]
child.pid is 11450
child.pid is 11462
{"RequestID":"req-1rjf1w6yc4zgsu5kds3s","level":"info","msg":"request","remoteAddr":"127.0.0.1:50260","reqBytes":0,"reqForwardedFor":"","reqHost":"zzzzz-4zz18-znoi8dpi452e9qi.collections.example.com:36683","reqMethod":"GET","reqPath":"testdata.bin","reqQuery":"","time":"2020-02-14T17:52:12.078006301-03:00"}
[...]

Thought that I might have some local problem, but current master runs OK.

Actions #5

Updated by Tom Clegg almost 5 years ago

Here, I see that error in the logs but it doesn't cause a test failure. The test does a GET request for a file whose content can't be retrieved. The handler only fails after it's returned 200, and the test doesn't check the response body. Changing it to an integration test makes the logged error go away:

16100-mime-types @ 3836d53ef13841dad652e3faeb20660576279afd -- developer-run-tests: #1735

Actions #6

Updated by Lucas Di Pentima almost 5 years ago

Thanks. Locally I was getting a test failure with errorlevel=29 when running the tests like this:

~/arvados/build/run-tests.sh WORKSPACE=~/arvados CONFIGSRC=~/arvados-test-config --temp ~/.cache/arvados-build --only services/keep-web --skip-install

Now this last fix makes the test pass, and correctly fail if I move the file /etc/mime.types to some other place, with the following message:

----------------------------------------------------------------------
FAIL: handler_test.go:919: IntegrationSuite.TestFileContentType

time="2020-02-17T13:23:06.120200890-03:00" level=warning msg="SystemRootToken missing from cluster config, falling back to ARVADOS_API_TOKEN environment variable" 
time="2020-02-17T13:23:06.120236123-03:00" level=warning msg="Services.Controller.ExternalURL missing from cluster config, falling back to ARVADOS_API_HOST(_INSECURE) environment variables" 
handler_test.go:974:
    c.Check(resp.Header().Get("Content-Type"), check.Equals, trial.contentType)
... obtained string = "image/bmp" 
... expected string = "text/plain; charset=utf-8" 

handler_test.go:974:
    c.Check(resp.Header().Get("Content-Type"), check.Equals, trial.contentType)
... obtained string = "image/bmp" 
... expected string = "image/x-ms-bmp" 

So this LGTM, thanks!!

Actions #7

Updated by Anonymous almost 5 years ago

  • % Done changed from 0 to 100
  • Status changed from In Progress to Resolved
Actions #8

Updated by Ward Vandewege over 4 years ago

Actions

Also available in: Atom PDF