Bug #16100
closed
[keep-web] Avoid sniffing for content type when file extension matches a MIME type
Added by Tom Clegg almost 5 years ago.
Updated almost 5 years ago.
Release relationship:
Auto
Description
Currently, when serving a GET request for a file, the WebDAV service uses the Go standard library's content sniffing feature to guess an appropriate Content-Type if the filename extension is not listed in /etc/mime.types
or a small built-in list of extensions. This is unreliable (and not just hypothetically -- users have been surprised by mysteriously broken previews).
For example, if the /etc/mime.types
file does not exist, a file called "bmx.txt" containing the text "BMX bikes are awesome.\n" is currently served with Content-Type: image/bmp
because the first two bytes "BM" satisfy the signature for a BMP image file, and this causes it to render incorrectly in the browser.
To avoid this problem:
Keep-web OS packages should list the package providing /etc/mime.types
-- "mailcap" on centos, "mime-support" on debian and ubuntu -- as a dependency.
At startup, keep-web should check the mime type for a common extension like .txt
that's not in the built-in list, and log a warning if it's missing.
Tom Clegg wrote:
Observed behavior: A file called "bmx.txt" containing the text "BMX bikes are awesome.\n" is currently served with Content-Type: image/bmp
because the first two bytes "BM" satisfy the signature for a BMP image file, and this causes it to render incorrectly in the browser.
FYI, the unix "file" command correctly identifies said file:
$ echo "BMX bikes are awesome.\n" > bmx.txt
$ file --mime bmx.txt
bmx.txt: text/plain; charset=us-ascii
$ file --version
file-5.37
- Status changed from New to In Progress
- Description updated (diff)
Although Jenkins says it's all fine, I've ran the services/keep-web
tests on my dev VMs (debian9 & debian10) and I'm getting a failure like this:
[...]
{"health":"OK"}
arv-git-httpd pid 11288 ok
{"health":"OK"}
{"health":"OK"}
ws pid 11304 ok
ARVADOS_TEST_PROXY_SERVICES=1
ARVADOS_API_TOKEN=4axaw8zxe0qm22wa6urpp5nskcne8z88cvbupv653y1njyi05h
ARVADOS_CONFIG=/media/psf/arvados/tmp/arvados.yml
ARVADOS_API_HOST=0.0.0.0:45751
ARVADOS_TEST_API_INSTALLED=10501
ARVADOS_TEST_API_HOST=0.0.0.0:54431
ARVADOS_API_HOST_INSECURE=true
======= test services/keep-web
time="2020-02-14T17:52:07-03:00" level=error msg="stat.Size()==3 but only wrote 0 bytes; read(1024) returns 0, GET acbd18db4cc2f85cedef654fccc4a4d8+3+A2b8e58eafe2fb5db03583e062bd1aa7871103fa8@5e597d77 failed: [http://localhost:39073/acbd18db4cc2f85cedef654fccc4a4d8+3+A2b8e58eafe2fb5db03583e062bd1aa7871103fa8@5e597d77: Get http://localhost:39073/acbd18db4cc2f85cedef654fccc4a4d8+3+A2b8e58eafe2fb5db03583e062bd1aa7871103fa8@5e597d77: dial tcp [::1]:39073: connect: connection refused http://localhost:33737/acbd18db4cc2f85cedef654fccc4a4d8+3+A2b8e58eafe2fb5db03583e062bd1aa7871103fa8@5e597d77: Get http://localhost:33737/acbd18db4cc2f85cedef654fccc4a4d8+3+A2b8e58eafe2fb5db03583e062bd1aa7871103fa8@5e597d77: dial tcp [::1]:33737: connect: connection refused http://localhost:39073/acbd18db4cc2f85cedef654fccc4a4d8+3+A2b8e58eafe2fb5db03583e062bd1aa7871103fa8@5e597d77: Get http://localhost:39073/acbd18db4cc2f85cedef654fccc4a4d8+3+A2b8e58eafe2fb5db03583e062bd1aa7871103fa8@5e597d77: dial tcp [::1]:39073: connect: connection refused http://localhost:33737/acbd18db4cc2f85cedef654fccc4a4d8+3+A2b8e58eafe2fb5db03583e062bd1aa7871103fa8@5e597d77: Get http://localhost:33737/acbd18db4cc2f85cedef654fccc4a4d8+3+A2b8e58eafe2fb5db03583e062bd1aa7871103fa8@5e597d77: dial tcp [::1]:33737: connect: connection refused http://localhost:39073/acbd18db4cc2f85cedef654fccc4a4d8+3+A2b8e58eafe2fb5db03583e062bd1aa7871103fa8@5e597d77: Get http://localhost:39073/acbd18db4cc2f85cedef654fccc4a4d8+3+A2b8e58eafe2fb5db03583e062bd1aa7871103fa8@5e597d77: dial tcp [::1]:39073: connect: connection refused http://localhost:33737/acbd18db4cc2f85cedef654fccc4a4d8+3+A2b8e58eafe2fb5db03583e062bd1aa7871103fa8@5e597d77: Get http://localhost:33737/acbd18db4cc2f85cedef654fccc4a4d8+3+A2b8e58eafe2fb5db03583e062bd1aa7871103fa8@5e597d77: dial tcp [::1]:33737: connect: connection refused]"
2020/02/14 17:52:09 authSettings: map[ARVADOS_API_HOST:0.0.0.0:54431 ARVADOS_API_HOST_INSECURE:true ARVADOS_API_TOKEN:4axaw8zxe0qm22wa6urpp5nskcne8z88cvbupv653y1njyi05h]
child.pid is 11450
child.pid is 11462
{"RequestID":"req-1rjf1w6yc4zgsu5kds3s","level":"info","msg":"request","remoteAddr":"127.0.0.1:50260","reqBytes":0,"reqForwardedFor":"","reqHost":"zzzzz-4zz18-znoi8dpi452e9qi.collections.example.com:36683","reqMethod":"GET","reqPath":"testdata.bin","reqQuery":"","time":"2020-02-14T17:52:12.078006301-03:00"}
[...]
Thought that I might have some local problem, but current master
runs OK.
Here, I see that error in the logs but it doesn't cause a test failure. The test does a GET request for a file whose content can't be retrieved. The handler only fails after it's returned 200, and the test doesn't check the response body. Changing it to an integration test makes the logged error go away:
16100-mime-types @ 3836d53ef13841dad652e3faeb20660576279afd -- developer-run-tests: #1735
Thanks. Locally I was getting a test failure with errorlevel=29 when running the tests like this:
~/arvados/build/run-tests.sh WORKSPACE=~/arvados CONFIGSRC=~/arvados-test-config --temp ~/.cache/arvados-build --only services/keep-web --skip-install
Now this last fix makes the test pass, and correctly fail if I move the file /etc/mime.types
to some other place, with the following message:
----------------------------------------------------------------------
FAIL: handler_test.go:919: IntegrationSuite.TestFileContentType
time="2020-02-17T13:23:06.120200890-03:00" level=warning msg="SystemRootToken missing from cluster config, falling back to ARVADOS_API_TOKEN environment variable"
time="2020-02-17T13:23:06.120236123-03:00" level=warning msg="Services.Controller.ExternalURL missing from cluster config, falling back to ARVADOS_API_HOST(_INSECURE) environment variables"
handler_test.go:974:
c.Check(resp.Header().Get("Content-Type"), check.Equals, trial.contentType)
... obtained string = "image/bmp"
... expected string = "text/plain; charset=utf-8"
handler_test.go:974:
c.Check(resp.Header().Get("Content-Type"), check.Equals, trial.contentType)
... obtained string = "image/bmp"
... expected string = "image/x-ms-bmp"
So this LGTM, thanks!!
- % Done changed from 0 to 100
- Status changed from In Progress to Resolved
Also available in: Atom
PDF