Bug #6918
closed[Documentation] keepproxy CORS headers get doubled, don't work
Added by Brett Smith over 9 years ago. Updated over 9 years ago.
Description
Our current install guide for keepproxy suggests that you use a few add_header
lines to add CORS headers to error responses. However, with the current keepproxy code, this causes the headers to get doubled. If you deploy according to our instructions, Firefox misinterprets the instructions and refuses to make requests that are supposed to be allowed. I'm guessing this is because '*, *' is not a meaningful Origin.
I note we're not setting these headers in our own production clusters.
Make this part of the documentation functional. If there's a way to make sure nginx only adds headers when they don't already exist, or only adds them to internal error responses, documenting that is good. Otherwise, removing the suggestion from the documentation seems like a reasonable solution to me too, but engineering can overrule me on that.
Updated by Brett Smith over 9 years ago
Discussion:
- It's probably better to just remove the lines from the configuration.
- The docs can also tell you to check the nginx logs if your Web uploader is having trouble, because that could mean the client isn't getting the necessary CORS headers.
- If we want to make the Web uploader more robust against these problems, that should be done in the Web uploader itself.
Updated by Peter Amstutz over 9 years ago
Another option is to teach keepproxy to look for a proxy header and suppress the CORS headers when a request comes through with that header present.
Updated by Peter Amstutz over 9 years ago
Also for completeness http://wiki.nginx.org/HttpHeadersMoreModule provides "more_set_header" which actually replaces the outgoing header like we want it to. But will just fix the documentation for now.
Updated by Tom Clegg over 9 years ago
AFAIK all of these are useless for keepproxy. "proxy_redirect off" is especially worth removing since the only thing it could do is to prevent a redirect-to-self response from working.
proxy_redirect off; proxy_set_header X-Forwarded-Proto https; proxy_set_header Host $http_host; proxy_set_header X-External-Client $external_client;
keep.example.com
should be changed to keep.uuid_prefix.your.domain
in the server_name and ssl_* lines. (I'd rather use *.example, see #6933, but in the meantime the nginx vhost name should match the name we use elsewhere on the page, especially arv keep_service create
.)
With those, LGTM.
Aside: this 64M body buffer seems to mean every 64M PUT requires: 64M on the client, 64M in nginx, 64M in keepproxy, and 64M (sometimes 128M, see #6997) in each of two keepstores. I suppose we could improve this situation by adding TLS support to keepproxy or (less so) by giving nginx a tmpfs for body buffers instead of telling it to hold everything in RAM...? Until then, this is what we've been using so I'm on board with adding it to the docs.
Another aside: the way keepproxy's GetRemoteAddress
func uses the "real IP" headers seems slightly sketchy: it should either know a list of trusted networks from which proxy headers are to be believed, or just always log the real RemoteAddr (in addition to the "real" IP given in headers). With the current setup, if the upstream proxy doesn't replace these headers with its own, arbitrary clients can prevent keepproxy from logging their real addresses. Reading the nginx docs it seems like X-Forwarded-For always includes the X-Real-IP information, so we could just log like this and get all of the same information in a way that's safe for any upstream proxy setup:
if xff := req.Header.Get("X-Forwarded-For"); xff != "" { return xff + "," + req.RemoteAddr } else { return req.RemoteAddr }
(Separate bug? Or am I missing something and this is already notabug?)
Updated by Peter Amstutz over 9 years ago
Tom Clegg wrote:
AFAIK all of these are useless for keepproxy. "proxy_redirect off" is especially worth removing since the only thing it could do is to prevent a redirect-to-self response from working.
I'm not sure if proxy_redirect
does what you think it does? It rewrites the Location
header of the response: http://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_redirect but I don't think keepproxy returns Location
(or it returns a relative Location
not a fully qualified URL.)
On the other hand, proxy_set_header Host $http_host;
is needed to adjust the Host
header to be the correct external hostname keep.uuid_prefix.your.domain
X-Forwarded-Proto
and X-External-Client
are both not strictly necessary, but the 2nd one is how we distinguish between clients that are internal or external to the cluster (this is not used by keepproxy but presumably there is some benefit to configuring the proxy servers consistently.)
I copied these from the settings we use in production, so if you think they are wrong or useless then we should ask Ward.
[...]
keep.example.com
should be changed tokeep.uuid_prefix.your.domain
in the server_name and ssl_* lines. (I'd rather use *.example, see #6933, but in the meantime the nginx vhost name should match the name we use elsewhere on the page, especiallyarv keep_service create
.)
Fixed.
With those, LGTM.
Aside: this 64M body buffer seems to mean every 64M PUT requires: 64M on the client, 64M in nginx, 64M in keepproxy, and 64M (sometimes 128M, see #6997) in each of two keepstores. I suppose we could improve this situation by adding TLS support to keepproxy or (less so) by giving nginx a tmpfs for body buffers instead of telling it to hold everything in RAM...? Until then, this is what we've been using so I'm on board with adding it to the docs.
I would very much like to see end-to-end-streaming in Keep, but this is not the ticket to take that up that epic.
That said, setting client_body_buffer_size
seems redundant because it should be streaming directly to keepproxy's buffers. I took it out.
Another aside: the way keepproxy's
GetRemoteAddress
func uses the "real IP" headers seems slightly sketchy: it should either know a list of trusted networks from which proxy headers are to be believed, or just always log the real RemoteAddr (in addition to the "real" IP given in headers). With the current setup, if the upstream proxy doesn't replace these headers with its own, arbitrary clients can prevent keepproxy from logging their real addresses. Reading the nginx docs it seems like X-Forwarded-For always includes the X-Real-IP information, so we could just log like this and get all of the same information in a way that's safe for any upstream proxy setup:[...]
(Separate bug? Or am I missing something and this is already notabug?)
Well, it's written assuming that keepproxy is deployed behind a SSL proxy that will feed it the correct information in X-Real-IP. We don't support any other configuration, if we did it then it would probably have to be tweaked.
Updated by Tom Clegg over 9 years ago
Peter Amstutz wrote:
Tom Clegg wrote:
AFAIK all of these are useless for keepproxy. "proxy_redirect off" is especially worth removing since the only thing it could do is to prevent a redirect-to-self response from working.
I'm not sure if
proxy_redirect
does what you think it does? It rewrites theLocation
header of the response: http://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_redirect but I don't think keepproxy returnsLocation
(or it returns a relativeLocation
not a fully qualified URL.)
On the other hand,proxy_set_header Host $http_host;
is needed to adjust theHost
header to be the correct external hostnamekeep.uuid_prefix.your.domain
I read the docs again, but I still don't see why not use nginx's defaults for both, which just automatically make these things work. (Even/especially if we don't currently need them to work.)
X-Forwarded-Proto
andX-External-Client
are both not strictly necessary, but the 2nd one is how we distinguish between clients that are internal or external to the cluster (this is not used by keepproxy but presumably there is some benefit to configuring the proxy servers consistently.)
Seems to me we should configure each nginx stanza appropriately rather than add a bunch of extra "why is this here?" config just so they all look similar. We went through this recently and ended up with http://doc.arvados.org/install/install-arv-git-httpd.html
I copied these from the settings we use in production, so if you think they are wrong or useless then we should ask Ward.
OK
That said, setting
client_body_buffer_size
seems redundant because it should be streaming directly to keepproxy's buffers. I took it out.
According to the nginx docs this makes it write temp files to disk, not stream -- but OK.
Well, it's written assuming that keepproxy is deployed behind a SSL proxy that will feed it the correct information in X-Real-IP. We don't support any other configuration, if we did it then it would probably have to be tweaked.
OK, I'll note that in the bug.
Updated by Peter Amstutz over 9 years ago
- Status changed from New to Resolved
- % Done changed from 50 to 100
Applied in changeset arvados|commit:b5a7183d43ca4607fdc259267087e7f795d14de2.
Updated by Tom Clegg over 9 years ago
- Status changed from Resolved to In Progress
Peter Amstutz wrote:
Tom Clegg wrote:
keep.example.com
should be changed tokeep.uuid_prefix.your.domain
in the server_name and ssl_* lines. (I'd rather use *.example, see #6933, but in the meantime the nginx vhost name should match the name we use elsewhere on the page, especiallyarv keep_service create
.)Fixed.
Looks like you missed the ssl_* part?
Updated by Peter Amstutz over 9 years ago
Tom Clegg wrote:
Peter Amstutz wrote:
Tom Clegg wrote:
keep.example.com
should be changed tokeep.uuid_prefix.your.domain
in the server_name and ssl_* lines. (I'd rather use *.example, see #6933, but in the meantime the nginx vhost name should match the name we use elsewhere on the page, especiallyarv keep_service create
.)Fixed.
Looks like you missed the ssl_* part?
Yep, you're right. Fixed.
Updated by Peter Amstutz over 9 years ago
- Status changed from In Progress to Resolved