Bug #6918

[Documentation] keepproxy CORS headers get doubled, don't work

Added by Brett Smith about 4 years ago. Updated almost 4 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
Documentation
Target version:
Start date:
08/06/2015
Due date:
% Done:

100%

Estimated time:
(Total: 0.00 h)
Story points:
0.5

Description

Our current install guide for keepproxy suggests that you use a few add_header lines to add CORS headers to error responses. However, with the current keepproxy code, this causes the headers to get doubled. If you deploy according to our instructions, Firefox misinterprets the instructions and refuses to make requests that are supposed to be allowed. I'm guessing this is because '*, *' is not a meaningful Origin.

I note we're not setting these headers in our own production clusters.

Make this part of the documentation functional. If there's a way to make sure nginx only adds headers when they don't already exist, or only adds them to internal error responses, documenting that is good. Otherwise, removing the suggestion from the documentation seems like a reasonable solution to me too, but engineering can overrule me on that.


Subtasks

Task #7040: Review 6918-remove-nginx-cors-headersResolvedPeter Amstutz

Task #7039: Fix docsResolvedPeter Amstutz

Associated revisions

Revision b5a7183d
Added by Peter Amstutz about 4 years ago

Merge branch '6918-remove-nginx-cors-headers' closes #6918

Revision 2c9b40bd (diff)
Added by Peter Amstutz about 4 years ago

Fix up additional places in keepproxy install docs to use example
"keep$uuid_prefix.your.domain" consistently. refs #6918

History

#1 Updated by Brett Smith about 4 years ago

Discussion:

  • It's probably better to just remove the lines from the configuration.
  • The docs can also tell you to check the nginx logs if your Web uploader is having trouble, because that could mean the client isn't getting the necessary CORS headers.
  • If we want to make the Web uploader more robust against these problems, that should be done in the Web uploader itself.

#2 Updated by Peter Amstutz about 4 years ago

  • Assigned To set to Peter Amstutz

#3 Updated by Peter Amstutz about 4 years ago

Another option is to teach keepproxy to look for a proxy header and suppress the CORS headers when a request comes through with that header present.

#4 Updated by Peter Amstutz about 4 years ago

Also for completeness http://wiki.nginx.org/HttpHeadersMoreModule provides "more_set_header" which actually replaces the outgoing header like we want it to. But will just fix the documentation for now.

#5 Updated by Tom Clegg about 4 years ago

AFAIK all of these are useless for keepproxy. "proxy_redirect off" is especially worth removing since the only thing it could do is to prevent a redirect-to-self response from working.

    proxy_redirect        off;
    proxy_set_header      X-Forwarded-Proto https;
    proxy_set_header      Host $http_host;
    proxy_set_header      X-External-Client $external_client;

keep.example.com should be changed to keep.uuid_prefix.your.domain in the server_name and ssl_* lines. (I'd rather use *.example, see #6933, but in the meantime the nginx vhost name should match the name we use elsewhere on the page, especially arv keep_service create.)

With those, LGTM.

Aside: this 64M body buffer seems to mean every 64M PUT requires: 64M on the client, 64M in nginx, 64M in keepproxy, and 64M (sometimes 128M, see #6997) in each of two keepstores. I suppose we could improve this situation by adding TLS support to keepproxy or (less so) by giving nginx a tmpfs for body buffers instead of telling it to hold everything in RAM...? Until then, this is what we've been using so I'm on board with adding it to the docs.

Another aside: the way keepproxy's GetRemoteAddress func uses the "real IP" headers seems slightly sketchy: it should either know a list of trusted networks from which proxy headers are to be believed, or just always log the real RemoteAddr (in addition to the "real" IP given in headers). With the current setup, if the upstream proxy doesn't replace these headers with its own, arbitrary clients can prevent keepproxy from logging their real addresses. Reading the nginx docs it seems like X-Forwarded-For always includes the X-Real-IP information, so we could just log like this and get all of the same information in a way that's safe for any upstream proxy setup:

if xff := req.Header.Get("X-Forwarded-For"); xff != "" {
  return xff + "," + req.RemoteAddr
} else {
  return req.RemoteAddr
}

(Separate bug? Or am I missing something and this is already notabug?)

#6 Updated by Peter Amstutz about 4 years ago

Tom Clegg wrote:

AFAIK all of these are useless for keepproxy. "proxy_redirect off" is especially worth removing since the only thing it could do is to prevent a redirect-to-self response from working.

I'm not sure if proxy_redirect does what you think it does? It rewrites the Location header of the response: http://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_redirect but I don't think keepproxy returns Location (or it returns a relative Location not a fully qualified URL.)

On the other hand, proxy_set_header Host $http_host; is needed to adjust the Host header to be the correct external hostname keep.uuid_prefix.your.domain

X-Forwarded-Proto and X-External-Client are both not strictly necessary, but the 2nd one is how we distinguish between clients that are internal or external to the cluster (this is not used by keepproxy but presumably there is some benefit to configuring the proxy servers consistently.)

I copied these from the settings we use in production, so if you think they are wrong or useless then we should ask Ward.

[...]

keep.example.com should be changed to keep.uuid_prefix.your.domain in the server_name and ssl_* lines. (I'd rather use *.example, see #6933, but in the meantime the nginx vhost name should match the name we use elsewhere on the page, especially arv keep_service create.)

Fixed.

With those, LGTM.

Aside: this 64M body buffer seems to mean every 64M PUT requires: 64M on the client, 64M in nginx, 64M in keepproxy, and 64M (sometimes 128M, see #6997) in each of two keepstores. I suppose we could improve this situation by adding TLS support to keepproxy or (less so) by giving nginx a tmpfs for body buffers instead of telling it to hold everything in RAM...? Until then, this is what we've been using so I'm on board with adding it to the docs.

I would very much like to see end-to-end-streaming in Keep, but this is not the ticket to take that up that epic.

That said, setting client_body_buffer_size seems redundant because it should be streaming directly to keepproxy's buffers. I took it out.

Another aside: the way keepproxy's GetRemoteAddress func uses the "real IP" headers seems slightly sketchy: it should either know a list of trusted networks from which proxy headers are to be believed, or just always log the real RemoteAddr (in addition to the "real" IP given in headers). With the current setup, if the upstream proxy doesn't replace these headers with its own, arbitrary clients can prevent keepproxy from logging their real addresses. Reading the nginx docs it seems like X-Forwarded-For always includes the X-Real-IP information, so we could just log like this and get all of the same information in a way that's safe for any upstream proxy setup:

[...]

(Separate bug? Or am I missing something and this is already notabug?)

Well, it's written assuming that keepproxy is deployed behind a SSL proxy that will feed it the correct information in X-Real-IP. We don't support any other configuration, if we did it then it would probably have to be tweaked.

#7 Updated by Tom Clegg about 4 years ago

Peter Amstutz wrote:

Tom Clegg wrote:

AFAIK all of these are useless for keepproxy. "proxy_redirect off" is especially worth removing since the only thing it could do is to prevent a redirect-to-self response from working.

I'm not sure if proxy_redirect does what you think it does? It rewrites the Location header of the response: http://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_redirect but I don't think keepproxy returns Location (or it returns a relative Location not a fully qualified URL.)
On the other hand, proxy_set_header Host $http_host; is needed to adjust the Host header to be the correct external hostname keep.uuid_prefix.your.domain

I read the docs again, but I still don't see why not use nginx's defaults for both, which just automatically make these things work. (Even/especially if we don't currently need them to work.)

X-Forwarded-Proto and X-External-Client are both not strictly necessary, but the 2nd one is how we distinguish between clients that are internal or external to the cluster (this is not used by keepproxy but presumably there is some benefit to configuring the proxy servers consistently.)

Seems to me we should configure each nginx stanza appropriately rather than add a bunch of extra "why is this here?" config just so they all look similar. We went through this recently and ended up with http://doc.arvados.org/install/install-arv-git-httpd.html

I copied these from the settings we use in production, so if you think they are wrong or useless then we should ask Ward.

OK

That said, setting client_body_buffer_size seems redundant because it should be streaming directly to keepproxy's buffers. I took it out.

According to the nginx docs this makes it write temp files to disk, not stream -- but OK.

Well, it's written assuming that keepproxy is deployed behind a SSL proxy that will feed it the correct information in X-Real-IP. We don't support any other configuration, if we did it then it would probably have to be tweaked.

OK, I'll note that in the bug.

#8 Updated by Peter Amstutz about 4 years ago

  • Status changed from New to Resolved
  • % Done changed from 50 to 100

Applied in changeset arvados|commit:b5a7183d43ca4607fdc259267087e7f795d14de2.

#9 Updated by Tom Clegg about 4 years ago

  • Status changed from Resolved to In Progress

Peter Amstutz wrote:

Tom Clegg wrote:

keep.example.com should be changed to keep.uuid_prefix.your.domain in the server_name and ssl_* lines. (I'd rather use *.example, see #6933, but in the meantime the nginx vhost name should match the name we use elsewhere on the page, especially arv keep_service create.)

Fixed.

Looks like you missed the ssl_* part?

https://arvados.org/projects/arvados/repository/revisions/b5a7183d/entry/doc/install/install-keepproxy.html.textile.liquid#L88

#10 Updated by Peter Amstutz about 4 years ago

Tom Clegg wrote:

Peter Amstutz wrote:

Tom Clegg wrote:

keep.example.com should be changed to keep.uuid_prefix.your.domain in the server_name and ssl_* lines. (I'd rather use *.example, see #6933, but in the meantime the nginx vhost name should match the name we use elsewhere on the page, especially arv keep_service create.)

Fixed.

Looks like you missed the ssl_* part?

https://arvados.org/projects/arvados/repository/revisions/b5a7183d/entry/doc/install/install-keepproxy.html.textile.liquid#L88

Yep, you're right. Fixed.

#11 Updated by Peter Amstutz almost 4 years ago

  • Status changed from In Progress to Resolved

Also available in: Atom PDF