Bug #16100
closed[keep-web] Avoid sniffing for content type when file extension matches a MIME type
Description
Currently, when serving a GET request for a file, the WebDAV service uses the Go standard library's content sniffing feature to guess an appropriate Content-Type if the filename extension is not listed in /etc/mime.types
or a small built-in list of extensions. This is unreliable (and not just hypothetically -- users have been surprised by mysteriously broken previews).
For example, if the /etc/mime.types
file does not exist, a file called "bmx.txt" containing the text "BMX bikes are awesome.\n" is currently served with Content-Type: image/bmp
because the first two bytes "BM" satisfy the signature for a BMP image file, and this causes it to render incorrectly in the browser.
To avoid this problem:
Keep-web OS packages should list the package providing /etc/mime.types
-- "mailcap" on centos, "mime-support" on debian and ubuntu -- as a dependency.
At startup, keep-web should check the mime type for a common extension like .txt
that's not in the built-in list, and log a warning if it's missing.
Related issues
Updated by Michael Crusoe over 4 years ago
Tom Clegg wrote:
Observed behavior: A file called "bmx.txt" containing the text "BMX bikes are awesome.\n" is currently served with
Content-Type: image/bmp
because the first two bytes "BM" satisfy the signature for a BMP image file, and this causes it to render incorrectly in the browser.
FYI, the unix "file" command correctly identifies said file:
$ echo "BMX bikes are awesome.\n" > bmx.txt $ file --mime bmx.txt bmx.txt: text/plain; charset=us-ascii $ file --version file-5.37
Updated by Tom Clegg over 4 years ago
- Status changed from New to In Progress
- Description updated (diff)
Updated by Tom Clegg over 4 years ago
16100-mime-types @ 46d2ed57248419200d5716cfef8de9a1bb911240 -- developer-run-tests: #1733
Updated by Lucas Di Pentima over 4 years ago
Although Jenkins says it's all fine, I've ran the services/keep-web
tests on my dev VMs (debian9 & debian10) and I'm getting a failure like this:
[...] {"health":"OK"} arv-git-httpd pid 11288 ok {"health":"OK"} {"health":"OK"} ws pid 11304 ok ARVADOS_TEST_PROXY_SERVICES=1 ARVADOS_API_TOKEN=4axaw8zxe0qm22wa6urpp5nskcne8z88cvbupv653y1njyi05h ARVADOS_CONFIG=/media/psf/arvados/tmp/arvados.yml ARVADOS_API_HOST=0.0.0.0:45751 ARVADOS_TEST_API_INSTALLED=10501 ARVADOS_TEST_API_HOST=0.0.0.0:54431 ARVADOS_API_HOST_INSECURE=true ======= test services/keep-web time="2020-02-14T17:52:07-03:00" level=error msg="stat.Size()==3 but only wrote 0 bytes; read(1024) returns 0, GET acbd18db4cc2f85cedef654fccc4a4d8+3+A2b8e58eafe2fb5db03583e062bd1aa7871103fa8@5e597d77 failed: [http://localhost:39073/acbd18db4cc2f85cedef654fccc4a4d8+3+A2b8e58eafe2fb5db03583e062bd1aa7871103fa8@5e597d77: Get http://localhost:39073/acbd18db4cc2f85cedef654fccc4a4d8+3+A2b8e58eafe2fb5db03583e062bd1aa7871103fa8@5e597d77: dial tcp [::1]:39073: connect: connection refused http://localhost:33737/acbd18db4cc2f85cedef654fccc4a4d8+3+A2b8e58eafe2fb5db03583e062bd1aa7871103fa8@5e597d77: Get http://localhost:33737/acbd18db4cc2f85cedef654fccc4a4d8+3+A2b8e58eafe2fb5db03583e062bd1aa7871103fa8@5e597d77: dial tcp [::1]:33737: connect: connection refused http://localhost:39073/acbd18db4cc2f85cedef654fccc4a4d8+3+A2b8e58eafe2fb5db03583e062bd1aa7871103fa8@5e597d77: Get http://localhost:39073/acbd18db4cc2f85cedef654fccc4a4d8+3+A2b8e58eafe2fb5db03583e062bd1aa7871103fa8@5e597d77: dial tcp [::1]:39073: connect: connection refused http://localhost:33737/acbd18db4cc2f85cedef654fccc4a4d8+3+A2b8e58eafe2fb5db03583e062bd1aa7871103fa8@5e597d77: Get http://localhost:33737/acbd18db4cc2f85cedef654fccc4a4d8+3+A2b8e58eafe2fb5db03583e062bd1aa7871103fa8@5e597d77: dial tcp [::1]:33737: connect: connection refused http://localhost:39073/acbd18db4cc2f85cedef654fccc4a4d8+3+A2b8e58eafe2fb5db03583e062bd1aa7871103fa8@5e597d77: Get http://localhost:39073/acbd18db4cc2f85cedef654fccc4a4d8+3+A2b8e58eafe2fb5db03583e062bd1aa7871103fa8@5e597d77: dial tcp [::1]:39073: connect: connection refused http://localhost:33737/acbd18db4cc2f85cedef654fccc4a4d8+3+A2b8e58eafe2fb5db03583e062bd1aa7871103fa8@5e597d77: Get http://localhost:33737/acbd18db4cc2f85cedef654fccc4a4d8+3+A2b8e58eafe2fb5db03583e062bd1aa7871103fa8@5e597d77: dial tcp [::1]:33737: connect: connection refused]" 2020/02/14 17:52:09 authSettings: map[ARVADOS_API_HOST:0.0.0.0:54431 ARVADOS_API_HOST_INSECURE:true ARVADOS_API_TOKEN:4axaw8zxe0qm22wa6urpp5nskcne8z88cvbupv653y1njyi05h] child.pid is 11450 child.pid is 11462 {"RequestID":"req-1rjf1w6yc4zgsu5kds3s","level":"info","msg":"request","remoteAddr":"127.0.0.1:50260","reqBytes":0,"reqForwardedFor":"","reqHost":"zzzzz-4zz18-znoi8dpi452e9qi.collections.example.com:36683","reqMethod":"GET","reqPath":"testdata.bin","reqQuery":"","time":"2020-02-14T17:52:12.078006301-03:00"} [...]
Thought that I might have some local problem, but current master
runs OK.
Updated by Tom Clegg over 4 years ago
Here, I see that error in the logs but it doesn't cause a test failure. The test does a GET request for a file whose content can't be retrieved. The handler only fails after it's returned 200, and the test doesn't check the response body. Changing it to an integration test makes the logged error go away:
16100-mime-types @ 3836d53ef13841dad652e3faeb20660576279afd -- developer-run-tests: #1735
Updated by Lucas Di Pentima over 4 years ago
Thanks. Locally I was getting a test failure with errorlevel=29 when running the tests like this:
~/arvados/build/run-tests.sh WORKSPACE=~/arvados CONFIGSRC=~/arvados-test-config --temp ~/.cache/arvados-build --only services/keep-web --skip-install
Now this last fix makes the test pass, and correctly fail if I move the file /etc/mime.types
to some other place, with the following message:
---------------------------------------------------------------------- FAIL: handler_test.go:919: IntegrationSuite.TestFileContentType time="2020-02-17T13:23:06.120200890-03:00" level=warning msg="SystemRootToken missing from cluster config, falling back to ARVADOS_API_TOKEN environment variable" time="2020-02-17T13:23:06.120236123-03:00" level=warning msg="Services.Controller.ExternalURL missing from cluster config, falling back to ARVADOS_API_HOST(_INSECURE) environment variables" handler_test.go:974: c.Check(resp.Header().Get("Content-Type"), check.Equals, trial.contentType) ... obtained string = "image/bmp" ... expected string = "text/plain; charset=utf-8" handler_test.go:974: c.Check(resp.Header().Get("Content-Type"), check.Equals, trial.contentType) ... obtained string = "image/bmp" ... expected string = "image/x-ms-bmp"
So this LGTM, thanks!!
Updated by Anonymous over 4 years ago
- % Done changed from 0 to 100
- Status changed from In Progress to Resolved
Applied in changeset arvados|0a415b6c80c3bf39bb753274aae857eadde2f590.
Updated by Ward Vandewege about 4 years ago
- Related to Idea #15348: [pam] PAM module in Go added