Project

General

Profile

Actions

Feature #17009

closed

[keep-web] S3 API should accept bucket name as first component of domain name

Added by Tom Clegg over 3 years ago. Updated about 3 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
Keep
Target version:
Story points:
1.0
Release relationship:
Auto

Description

Currently it only accepts bucket name in path, but it should be easy enough to accept bucket name in the domain name as we already do in keep-web for non-S3 requests.


Subtasks 3 (0 open3 closed)

Task #17143: Review 17009-s3-bucket-vhostResolvedPeter Amstutz11/19/2020Actions
Task #17169: Get cyberduck to workResolved11/19/2020Actions
Task #17191: Review 17009-s3-vhost-listResolvedTom Clegg11/19/2020Actions

Related issues

Related to Arvados Epics - Idea #16360: Keep-web supports S3 compatible interfaceResolved07/01/202004/30/2021Actions
Blocked by Arvados - Feature #17011: Add keep-web wildcard DNS to saltResolvedWard Vandewege11/25/2020Actions
Actions #1

Updated by Tom Clegg over 3 years ago

  • Related to Idea #16360: Keep-web supports S3 compatible interface added
Actions #2

Updated by Peter Amstutz over 3 years ago

I'm trying to use the command line version of cyberduck from https://duck.sh/

I'm trying to list the contents of a bucket:

duck -l s3://download.ce8i5.arvadosapi.com/ce8i5-j7d0g-g6r8w0853s32ged/

This doesn't work because it is connecting to

ce8i5-j7d0g-g6r8w0853s32ged.download.ce8i5.arvadosapi.com

From debugging, I see something about:

s3service.disable-dns-buckets=false

This seems to be a configuration option of the jets3t java library used by Duck. I don't know how to set it, though.
creating ~/.duck/jets3t.properties didn't seem to work.

Actions #3

Updated by Peter Amstutz over 3 years ago

  • Target version set to 2020-12-02 Sprint
Actions #4

Updated by Peter Amstutz over 3 years ago

Actions #5

Updated by Peter Amstutz over 3 years ago

  • Story points set to 1.0
Actions #6

Updated by Peter Amstutz over 3 years ago

  • Assigned To set to Tom Clegg
Actions #8

Updated by Tom Clegg over 3 years ago

  • Status changed from New to In Progress
Actions #9

Updated by Tom Clegg over 3 years ago

Worth adding a note to that keep-web install page along these lines? "The *.collections.ClusterID.example.com option is preferred if you plan to access Keep using third-party S3 client software."

(Some clients can be configured to use a different pattern like {bucket}--collections.example.com but even for them it's probably less effort overall to use the default pattern.)

Actions #10

Updated by Peter Amstutz over 3 years ago

17009-s3-bucket-vhost @ baeef76a2b3b60fb3613d01b1df2916397e8c589

Well, that was easy.

We'll want to do some manual testing when the wildcard certificates get set up on one of the dev clusters.

Otherwise, this LGTM.

Tom Clegg wrote:

Worth adding a note to that keep-web install page along these lines? "The *.collections.ClusterID.example.com option is preferred if you plan to access Keep using third-party S3 client software."

(Some clients can be configured to use a different pattern like {bucket}--collections.example.com but even for them it's probably less effort overall to use the default pattern.)

Yes, it should be recommended. Also the introduction on that page should mention support for S3 API.

Actions #11

Updated by Tom Clegg over 3 years ago

Install doc updates:

17009-s3-bucket-vhost @ 2c3df643bc9effb76a26d56c6b4881856003c053

Actions #12

Updated by Anonymous over 3 years ago

  • Status changed from In Progress to Resolved
Actions #13

Updated by Peter Amstutz over 3 years ago

  • Status changed from Resolved to Feedback
Actions #14

Updated by Peter Amstutz over 3 years ago

Cyberduck still doesn't quite work. It is supposed to be returning a list of bucket contents but instead it is returning an application/x-directory object.

$ duck -v -l  s3://collections.ce8i5.arvadosapi.com/ce8i5-4zz18-ohp73xy8om7aipj
Listing directory ce8i5-4zz18-ohp73xy8om7aipj…
Login collections.ce8i5.arvadosapi.com. Login collections.ce8i5.arvadosapi.com – S3 with username and password. No login credentials could be found in the Keychain.
Access Key ID (peter): ce8i5-gj3su-02f1ov5mgblpf5b
Login as ce8i5-gj3su-02f1ov5mgblpf5b
Secret Access Key: 
WARNING! Passwords are stored in plain text in ~/.duck/credentials.
Save password (y/n): y
Authenticating as ce8i5-gj3su-02f1ov5mgblpf5b…
> GET / HTTP/1.1
> Date: Wed, 25 Nov 2020 16:14:18 GMT
> x-amz-content-sha256: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
> Host: collections.ce8i5.arvadosapi.com
> x-amz-date: 20201125T161418Z
> Authorization: ********
> Connection: Keep-Alive
> User-Agent: Cyberduck/7.7.0.33744 (Linux/4.19.0-10-amd64) (amd64)

< HTTP/1.1 200 OK
< Server: nginx/1.14.0 (Ubuntu)
< Date: Wed, 25 Nov 2020 16:14:18 GMT
< Content-Type: application/xml
< Content-Length: 271
< Connection: keep-alive
< Strict-Transport-Security: max-age=63072000

> GET /?encoding-type=url&max-keys=1000&prefix&delimiter=%2F HTTP/1.1
> Date: Wed, 25 Nov 2020 16:14:18 GMT
> x-amz-content-sha256: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
> Host: collections.ce8i5.arvadosapi.com
> x-amz-date: 20201125T161418Z
> Authorization: ********
> Connection: Keep-Alive
> User-Agent: Cyberduck/7.7.0.33744 (Linux/4.19.0-10-amd64) (amd64)

< HTTP/1.1 200 OK
< Server: nginx/1.14.0 (Ubuntu)
< Date: Wed, 25 Nov 2020 16:14:18 GMT
< Content-Type: application/xml
< Content-Length: 272
< Connection: keep-alive

Login successful…

> GET /?versioning HTTP/1.1
> Date: Wed, 25 Nov 2020 16:14:18 GMT
> x-amz-content-sha256: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
> Host: ce8i5-4zz18-ohp73xy8om7aipj.collections.ce8i5.arvadosapi.com
> x-amz-date: 20201125T161418Z
> Authorization: ********
> Connection: Keep-Alive
> User-Agent: Cyberduck/7.7.0.33744 (Linux/4.19.0-10-amd64) (amd64)

< HTTP/1.1 200 OK
< Server: nginx/1.14.0 (Ubuntu)
< Date: Wed, 25 Nov 2020 16:14:19 GMT
< Content-Type: application/x-directory
< Content-Length: 0
< Connection: keep-alive
< Strict-Transport-Security: max-age=63072000

> GET /?encoding-type=url&max-keys=1000&prefix&delimiter=%2F HTTP/1.1
> Date: Wed, 25 Nov 2020 16:14:19 GMT
> x-amz-content-sha256: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
> Host: ce8i5-4zz18-ohp73xy8om7aipj.collections.ce8i5.arvadosapi.com
> x-amz-date: 20201125T161419Z
> Authorization: ********
> Connection: Keep-Alive
> User-Agent: Cyberduck/7.7.0.33744 (Linux/4.19.0-10-amd64) (amd64)

< HTTP/1.1 200 OK
< Server: nginx/1.14.0 (Ubuntu)
< Date: Wed, 25 Nov 2020 16:14:19 GMT
< Content-Type: application/x-directory
< Content-Length: 0
< Connection: keep-alive
< Strict-Transport-Security: max-age=63072000
Listing directory ce8i5-4zz18-ohp73xy8om7aipj failed. Failed to parse XML document with handler class org.jets3t.service.impl.rest.XmlResponsesSaxParser$ListBucketHandler. Please contact your web hosting service provider for assistance.

Actions #15

Updated by Tom Clegg over 3 years ago

  • Target version changed from 2020-12-02 Sprint to 2020-12-16 Sprint
Actions #16

Updated by Tom Clegg over 3 years ago

  • Status changed from Feedback to In Progress
Actions #17

Updated by Tom Clegg over 3 years ago

The XML parsing failure in #17009#note-14 was caused by incorrect routing, fixed in master at 0c5e55d63. But the path handling was still broken for list operations, which is fixed here. Also adds a test for list/get/put using vhost style requests.

17009-s3-vhost-list @ f46eee810702b655737007bdfecf91201cdb27ca -- developer-run-tests: #2205

Actions #18

Updated by Lucas Di Pentima over 3 years ago

This LGTM. Was trying to manually test it on arvbox, but I think it would be quicker to merge and test against our dev clusters. Thanks!

Actions #19

Updated by Lucas Di Pentima over 3 years ago

Tried with the duck command as described on #note-14 and it listed the collection correctly:

$ duck -v -l  s3://collections.ce8i5.arvadosapi.com/ce8i5-4zz18-ohp73xy8om7aipj
Listing directory ce8i5-4zz18-ohp73xy8om7aipj…
[...]
Authenticating as ce8i5-gj3su-ggs1g0lp3coa7bc…
> GET / HTTP/1.1
> Date: Wed, 09 Dec 2020 19:46:43 GMT
> x-amz-content-sha256: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
> Host: collections.ce8i5.arvadosapi.com
> x-amz-date: 20201209T194643Z
> Authorization: ********
> Connection: Keep-Alive
> User-Agent: Cyberduck/7.7.2.33862 (Mac OS X/10.15.7) (x86_64)
< HTTP/1.1 200 OK
< Server: nginx/1.14.0 (Ubuntu)
< Date: Wed, 09 Dec 2020 19:46:44 GMT
< Content-Type: application/xml
< Content-Length: 271
< Connection: keep-alive
< Strict-Transport-Security: max-age=63072000
> GET /?encoding-type=url&max-keys=1000&prefix&delimiter=%2F HTTP/1.1
> Date: Wed, 09 Dec 2020 19:46:44 GMT
> x-amz-content-sha256: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
> Host: collections.ce8i5.arvadosapi.com
> x-amz-date: 20201209T194644Z
> Authorization: ********
> Connection: Keep-Alive
> User-Agent: Cyberduck/7.7.2.33862 (Mac OS X/10.15.7) (x86_64)
< HTTP/1.1 200 OK
< Server: nginx/1.14.0 (Ubuntu)
< Date: Wed, 09 Dec 2020 19:46:44 GMT
< Content-Type: application/xml
< Content-Length: 272
< Connection: keep-alive
Login successful…
> GET /?versioning HTTP/1.1
> Date: Wed, 09 Dec 2020 19:46:45 GMT
> x-amz-content-sha256: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
> Host: ce8i5-4zz18-ohp73xy8om7aipj.collections.ce8i5.arvadosapi.com
> x-amz-date: 20201209T194645Z
> Authorization: ********
> Connection: Keep-Alive
> User-Agent: Cyberduck/7.7.2.33862 (Mac OS X/10.15.7) (x86_64)
< HTTP/1.1 200 OK
< Server: nginx/1.14.0 (Ubuntu)
< Date: Wed, 09 Dec 2020 19:46:45 GMT
< Content-Type: application/xml
< Content-Length: 114
< Connection: keep-alive
< Strict-Transport-Security: max-age=63072000
> GET /?encoding-type=url&max-keys=1000&prefix&delimiter=%2F HTTP/1.1
> Date: Wed, 09 Dec 2020 19:46:45 GMT
> x-amz-content-sha256: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
> Host: ce8i5-4zz18-ohp73xy8om7aipj.collections.ce8i5.arvadosapi.com
> x-amz-date: 20201209T194645Z
> Authorization: ********
> Connection: Keep-Alive
> User-Agent: Cyberduck/7.7.2.33862 (Mac OS X/10.15.7) (x86_64)
< HTTP/1.1 200 OK
< Server: nginx/1.14.0 (Ubuntu)
< Date: Wed, 09 Dec 2020 19:46:46 GMT
< Content-Type: application/xml
< Content-Length: 710
< Connection: keep-alive
< Strict-Transport-Security: max-age=63072000
cwl.output.json
output.txt
> GET /?prefix&delimiter=%2F&uploads HTTP/1.1
> Date: Wed, 09 Dec 2020 19:46:46 GMT
> x-amz-content-sha256: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
> Host: ce8i5-4zz18-ohp73xy8om7aipj.collections.ce8i5.arvadosapi.com
> x-amz-date: 20201209T194646Z
> Authorization: ********
> Connection: Keep-Alive
> User-Agent: Cyberduck/7.7.2.33862 (Mac OS X/10.15.7) (x86_64)
< HTTP/1.1 200 OK
< Server: nginx/1.14.0 (Ubuntu)
< Date: Wed, 09 Dec 2020 19:46:46 GMT
< Content-Type: application/xml
< Content-Length: 710
< Connection: keep-alive
< Strict-Transport-Security: max-age=63072000

I think this is ready to be marked as resolved.

Actions #20

Updated by Tom Clegg over 3 years ago

  • Status changed from In Progress to Resolved
Actions #21

Updated by Peter Amstutz about 3 years ago

  • Release set to 37
Actions

Also available in: Atom PDF