Feature #17009

[keep-web] S3 API should accept bucket name as first component of domain name

Added by Tom Clegg about 1 year ago. Updated 8 months ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
Keep
Target version:
Start date:
11/19/2020
Due date:
% Done:

100%

Estimated time:
(Total: 0.00 h)
Story points:
1.0
Release relationship:
Auto

Description

Currently it only accepts bucket name in path, but it should be easy enough to accept bucket name in the domain name as we already do in keep-web for non-S3 requests.


Subtasks

Task #17143: Review 17009-s3-bucket-vhostResolvedPeter Amstutz

Task #17169: Get cyberduck to workResolved

Task #17191: Review 17009-s3-vhost-listResolvedTom Clegg


Related issues

Related to Arvados Epics - Story #16360: Keep-web supports S3 compatible interfaceResolved07/01/202004/30/2021

Blocked by Arvados - Feature #17011: Add keep-web wildcard DNS to saltResolved11/25/2020

Associated revisions

Revision 40a4776f
Added by Tom Clegg 11 months ago

Merge branch '17009-s3-bucket-vhost'

closes #17009

Arvados-DCO-1.1-Signed-off-by: Tom Clegg <>

Revision 0c5e55d6 (diff)
Added by Tom Clegg 11 months ago

17009: Fix bucket-level ops using virtual host-style requests.

refs #17009

Arvados-DCO-1.1-Signed-off-by: Tom Clegg <>

Revision c4f5d2f5 (diff)
Added by Tom Clegg 11 months ago

Update error regexp in test case.

refs #17009

Arvados-DCO-1.1-Signed-off-by: Tom Clegg <>

Revision 3ecc1fc9
Added by Tom Clegg 10 months ago

Merge branch '17009-s3-vhost-list'

refs #17009

Arvados-DCO-1.1-Signed-off-by: Tom Clegg <>

Revision e8ad88b1 (diff)
Added by Tom Clegg 10 months ago

17009: Fix bucket-level ops using virtual host-style requests.

refs #17009

Arvados-DCO-1.1-Signed-off-by: Tom Clegg <>

Revision 33c97ad5 (diff)
Added by Tom Clegg 10 months ago

Update error regexp in test case.

refs #17009

Arvados-DCO-1.1-Signed-off-by: Tom Clegg <>

Revision 2dab7769 (diff)
Added by Tom Clegg 8 months ago

17009: Fix bucket-level ops using virtual host-style requests.

refs #17009

Arvados-DCO-1.1-Signed-off-by: Tom Clegg <>

Revision ecd2ef60 (diff)
Added by Tom Clegg 8 months ago

Update error regexp in test case.

refs #17009

Arvados-DCO-1.1-Signed-off-by: Tom Clegg <>

History

#1 Updated by Tom Clegg about 1 year ago

  • Related to Story #16360: Keep-web supports S3 compatible interface added

#2 Updated by Peter Amstutz 11 months ago

I'm trying to use the command line version of cyberduck from https://duck.sh/

I'm trying to list the contents of a bucket:

duck -l s3://download.ce8i5.arvadosapi.com/ce8i5-j7d0g-g6r8w0853s32ged/

This doesn't work because it is connecting to

ce8i5-j7d0g-g6r8w0853s32ged.download.ce8i5.arvadosapi.com

From debugging, I see something about:

s3service.disable-dns-buckets=false

This seems to be a configuration option of the jets3t java library used by Duck. I don't know how to set it, though.
creating ~/.duck/jets3t.properties didn't seem to work.

#3 Updated by Peter Amstutz 11 months ago

  • Target version set to 2020-12-02 Sprint

#4 Updated by Peter Amstutz 11 months ago

#5 Updated by Peter Amstutz 11 months ago

  • Story points set to 1.0

#6 Updated by Peter Amstutz 11 months ago

  • Assigned To set to Tom Clegg

#8 Updated by Tom Clegg 11 months ago

  • Status changed from New to In Progress

#9 Updated by Tom Clegg 11 months ago

Worth adding a note to that keep-web install page along these lines? "The *.collections.ClusterID.example.com option is preferred if you plan to access Keep using third-party S3 client software."

(Some clients can be configured to use a different pattern like {bucket}--collections.example.com but even for them it's probably less effort overall to use the default pattern.)

#10 Updated by Peter Amstutz 11 months ago

17009-s3-bucket-vhost @ baeef76a2b3b60fb3613d01b1df2916397e8c589

Well, that was easy.

We'll want to do some manual testing when the wildcard certificates get set up on one of the dev clusters.

Otherwise, this LGTM.

Tom Clegg wrote:

Worth adding a note to that keep-web install page along these lines? "The *.collections.ClusterID.example.com option is preferred if you plan to access Keep using third-party S3 client software."

(Some clients can be configured to use a different pattern like {bucket}--collections.example.com but even for them it's probably less effort overall to use the default pattern.)

Yes, it should be recommended. Also the introduction on that page should mention support for S3 API.

#11 Updated by Tom Clegg 11 months ago

Install doc updates:

17009-s3-bucket-vhost @ 2c3df643bc9effb76a26d56c6b4881856003c053

#12 Updated by Anonymous 11 months ago

  • Status changed from In Progress to Resolved

#13 Updated by Peter Amstutz 11 months ago

  • Status changed from Resolved to Feedback

#14 Updated by Peter Amstutz 11 months ago

Cyberduck still doesn't quite work. It is supposed to be returning a list of bucket contents but instead it is returning an application/x-directory object.

$ duck -v -l  s3://collections.ce8i5.arvadosapi.com/ce8i5-4zz18-ohp73xy8om7aipj
Listing directory ce8i5-4zz18-ohp73xy8om7aipj…
Login collections.ce8i5.arvadosapi.com. Login collections.ce8i5.arvadosapi.com – S3 with username and password. No login credentials could be found in the Keychain.
Access Key ID (peter): ce8i5-gj3su-02f1ov5mgblpf5b
Login as ce8i5-gj3su-02f1ov5mgblpf5b
Secret Access Key: 
WARNING! Passwords are stored in plain text in ~/.duck/credentials.
Save password (y/n): y
Authenticating as ce8i5-gj3su-02f1ov5mgblpf5b…
> GET / HTTP/1.1
> Date: Wed, 25 Nov 2020 16:14:18 GMT
> x-amz-content-sha256: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
> Host: collections.ce8i5.arvadosapi.com
> x-amz-date: 20201125T161418Z
> Authorization: ********
> Connection: Keep-Alive
> User-Agent: Cyberduck/7.7.0.33744 (Linux/4.19.0-10-amd64) (amd64)

< HTTP/1.1 200 OK
< Server: nginx/1.14.0 (Ubuntu)
< Date: Wed, 25 Nov 2020 16:14:18 GMT
< Content-Type: application/xml
< Content-Length: 271
< Connection: keep-alive
< Strict-Transport-Security: max-age=63072000

> GET /?encoding-type=url&max-keys=1000&prefix&delimiter=%2F HTTP/1.1
> Date: Wed, 25 Nov 2020 16:14:18 GMT
> x-amz-content-sha256: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
> Host: collections.ce8i5.arvadosapi.com
> x-amz-date: 20201125T161418Z
> Authorization: ********
> Connection: Keep-Alive
> User-Agent: Cyberduck/7.7.0.33744 (Linux/4.19.0-10-amd64) (amd64)

< HTTP/1.1 200 OK
< Server: nginx/1.14.0 (Ubuntu)
< Date: Wed, 25 Nov 2020 16:14:18 GMT
< Content-Type: application/xml
< Content-Length: 272
< Connection: keep-alive

Login successful…

> GET /?versioning HTTP/1.1
> Date: Wed, 25 Nov 2020 16:14:18 GMT
> x-amz-content-sha256: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
> Host: ce8i5-4zz18-ohp73xy8om7aipj.collections.ce8i5.arvadosapi.com
> x-amz-date: 20201125T161418Z
> Authorization: ********
> Connection: Keep-Alive
> User-Agent: Cyberduck/7.7.0.33744 (Linux/4.19.0-10-amd64) (amd64)

< HTTP/1.1 200 OK
< Server: nginx/1.14.0 (Ubuntu)
< Date: Wed, 25 Nov 2020 16:14:19 GMT
< Content-Type: application/x-directory
< Content-Length: 0
< Connection: keep-alive
< Strict-Transport-Security: max-age=63072000

> GET /?encoding-type=url&max-keys=1000&prefix&delimiter=%2F HTTP/1.1
> Date: Wed, 25 Nov 2020 16:14:19 GMT
> x-amz-content-sha256: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
> Host: ce8i5-4zz18-ohp73xy8om7aipj.collections.ce8i5.arvadosapi.com
> x-amz-date: 20201125T161419Z
> Authorization: ********
> Connection: Keep-Alive
> User-Agent: Cyberduck/7.7.0.33744 (Linux/4.19.0-10-amd64) (amd64)

< HTTP/1.1 200 OK
< Server: nginx/1.14.0 (Ubuntu)
< Date: Wed, 25 Nov 2020 16:14:19 GMT
< Content-Type: application/x-directory
< Content-Length: 0
< Connection: keep-alive
< Strict-Transport-Security: max-age=63072000
Listing directory ce8i5-4zz18-ohp73xy8om7aipj failed. Failed to parse XML document with handler class org.jets3t.service.impl.rest.XmlResponsesSaxParser$ListBucketHandler. Please contact your web hosting service provider for assistance.

#15 Updated by Tom Clegg 11 months ago

  • Target version changed from 2020-12-02 Sprint to 2020-12-16 Sprint

#16 Updated by Tom Clegg 11 months ago

  • Status changed from Feedback to In Progress

#17 Updated by Tom Clegg 11 months ago

The XML parsing failure in #17009#note-14 was caused by incorrect routing, fixed in master at 0c5e55d63. But the path handling was still broken for list operations, which is fixed here. Also adds a test for list/get/put using vhost style requests.

17009-s3-vhost-list @ f46eee810702b655737007bdfecf91201cdb27ca -- https://ci.arvados.org/view/Developer/job/developer-run-tests/2205/

#18 Updated by Lucas Di Pentima 10 months ago

This LGTM. Was trying to manually test it on arvbox, but I think it would be quicker to merge and test against our dev clusters. Thanks!

#19 Updated by Lucas Di Pentima 10 months ago

Tried with the duck command as described on #note-14 and it listed the collection correctly:

$ duck -v -l  s3://collections.ce8i5.arvadosapi.com/ce8i5-4zz18-ohp73xy8om7aipj
Listing directory ce8i5-4zz18-ohp73xy8om7aipj…
[...]
Authenticating as ce8i5-gj3su-ggs1g0lp3coa7bc…
> GET / HTTP/1.1
> Date: Wed, 09 Dec 2020 19:46:43 GMT
> x-amz-content-sha256: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
> Host: collections.ce8i5.arvadosapi.com
> x-amz-date: 20201209T194643Z
> Authorization: ********
> Connection: Keep-Alive
> User-Agent: Cyberduck/7.7.2.33862 (Mac OS X/10.15.7) (x86_64)
< HTTP/1.1 200 OK
< Server: nginx/1.14.0 (Ubuntu)
< Date: Wed, 09 Dec 2020 19:46:44 GMT
< Content-Type: application/xml
< Content-Length: 271
< Connection: keep-alive
< Strict-Transport-Security: max-age=63072000
> GET /?encoding-type=url&max-keys=1000&prefix&delimiter=%2F HTTP/1.1
> Date: Wed, 09 Dec 2020 19:46:44 GMT
> x-amz-content-sha256: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
> Host: collections.ce8i5.arvadosapi.com
> x-amz-date: 20201209T194644Z
> Authorization: ********
> Connection: Keep-Alive
> User-Agent: Cyberduck/7.7.2.33862 (Mac OS X/10.15.7) (x86_64)
< HTTP/1.1 200 OK
< Server: nginx/1.14.0 (Ubuntu)
< Date: Wed, 09 Dec 2020 19:46:44 GMT
< Content-Type: application/xml
< Content-Length: 272
< Connection: keep-alive
Login successful…
> GET /?versioning HTTP/1.1
> Date: Wed, 09 Dec 2020 19:46:45 GMT
> x-amz-content-sha256: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
> Host: ce8i5-4zz18-ohp73xy8om7aipj.collections.ce8i5.arvadosapi.com
> x-amz-date: 20201209T194645Z
> Authorization: ********
> Connection: Keep-Alive
> User-Agent: Cyberduck/7.7.2.33862 (Mac OS X/10.15.7) (x86_64)
< HTTP/1.1 200 OK
< Server: nginx/1.14.0 (Ubuntu)
< Date: Wed, 09 Dec 2020 19:46:45 GMT
< Content-Type: application/xml
< Content-Length: 114
< Connection: keep-alive
< Strict-Transport-Security: max-age=63072000
> GET /?encoding-type=url&max-keys=1000&prefix&delimiter=%2F HTTP/1.1
> Date: Wed, 09 Dec 2020 19:46:45 GMT
> x-amz-content-sha256: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
> Host: ce8i5-4zz18-ohp73xy8om7aipj.collections.ce8i5.arvadosapi.com
> x-amz-date: 20201209T194645Z
> Authorization: ********
> Connection: Keep-Alive
> User-Agent: Cyberduck/7.7.2.33862 (Mac OS X/10.15.7) (x86_64)
< HTTP/1.1 200 OK
< Server: nginx/1.14.0 (Ubuntu)
< Date: Wed, 09 Dec 2020 19:46:46 GMT
< Content-Type: application/xml
< Content-Length: 710
< Connection: keep-alive
< Strict-Transport-Security: max-age=63072000
cwl.output.json
output.txt
> GET /?prefix&delimiter=%2F&uploads HTTP/1.1
> Date: Wed, 09 Dec 2020 19:46:46 GMT
> x-amz-content-sha256: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
> Host: ce8i5-4zz18-ohp73xy8om7aipj.collections.ce8i5.arvadosapi.com
> x-amz-date: 20201209T194646Z
> Authorization: ********
> Connection: Keep-Alive
> User-Agent: Cyberduck/7.7.2.33862 (Mac OS X/10.15.7) (x86_64)
< HTTP/1.1 200 OK
< Server: nginx/1.14.0 (Ubuntu)
< Date: Wed, 09 Dec 2020 19:46:46 GMT
< Content-Type: application/xml
< Content-Length: 710
< Connection: keep-alive
< Strict-Transport-Security: max-age=63072000

I think this is ready to be marked as resolved.

#20 Updated by Tom Clegg 10 months ago

  • Status changed from In Progress to Resolved

#21 Updated by Peter Amstutz 8 months ago

  • Release set to 37

Also available in: Atom PDF