Bug #22003
closedClient-side redirect in WB2 may target incorrect URL if path contains special characters
Description
Symptom¶
The following URL https://workbench.pirca.arvadosapi.com/?redirectToPreview=/c=pirca-4zz18-w0rawdttbs8qz38/test%20-%20%2527/
is generated by the "Copy link to clipboard" action in the file listing of a collection in Wb2 (pirca-4zz18-w0rawdttbs8qz38
) which contains the path segment test - %27
. This redirect is handled by navigating to [keep-web base url]/test - %27/
, and the path element will be interpreted as test - '
, mapped to a non-existent directory in the collection, which incurs a Not Found.
Cause¶
In Workbench2, when the URL has query parameters redirectToPreview=...
or redirectToDownload=...
, the client-side app will perform a "redirect" by setting the window.location.href
to the path passed as value to one of those query parameters.
There are mainly two ways how wb2 may receive such an URL with redirect parameter.
- Generated by Wb2 internally: when copying the URL of a file/directory in a collection to the clipboard.
- Served by keep-web when the client tries to access a collection without right credentials: if keep-web thinks the UA is a browser, it serves a
303 See Other
withLocation:
header set to a URL at the Wb2 origin with the redirect parameter.
In the 1st case, Wb2 generates the parameter value by taking the <D:href>
element's value (a URL) from WebDAV responses and extracting the path element. The <D:href>
value is already percent-encoded (RFC 4918), and this is indeed the case in the WebDAV response the server generates. It's this encoded path value that Wb2 takes and uses to construct the redirect target URL.
Suppose the underlying path is R
that refers to a file in a collection. It undergoes the transformations
1a R (raw, FS path from manifest)
1b <D:href>ENC(R)</D:href> (WebDAV response body, up to XML entity transformation which is transparent to this discussion)
----------------
1c [wb2_origin]/?redirectToDownload=ENC(R) (URL copied to clipboard by Wb2, other query params not shown)
1d GET decodeURIComponent(ENC(R)) (URL is pasted in browser; Wb2 redirect handler runs)
Here, the function
ENC(R)
is essentially Go's (*url.URL) EscapedPath()
method and decodeURIComponent
is the standard JS function. In general we have R = decodeURIComponent(ENC(R))
so Wb2 is seen redirecting to the raw path R
as it is, which in general will not land us at the resource identified by R
because the web server locates the resource (dereferencing) by decoding the argument of GET. This is why we get the error in with the constructed test case.
In the 2nd case, the server tries to second-guess (some older version of?) Wb2 (source:services/keep-web/handler.go@7f01a21e#L514), and intentionally sends the URL (in Location:
header), without percent-encoding the query parameter.
2a R (raw, FS path from the manifest)
----------------
2b Location: [wb2_origin]/?redirecToDownload=R (303 See Other response)
2c GET decodeURIComponent(R) (Wb2 redirect handler)
So again the server will not get the client to understand what path is to be requested (assuming that the client even gets the full raw path
R
and pass it to the decoding routine).
(For the behavior of Wb2 client-side redirect handler, see source:services/workbench2/src/common/redirect-to.ts@7f01a21e#L42)
This shows that the reason why it may work for many paths R
(including test cases) is that none of the various rounds of encoding and decoding of R
actually change it. But in general the web server always gets back some unexpected input.
Proposed solution¶
In the above boxes the steps below the horizontal line are within our control. In particular, in the first box Wb2 is in control of everything below the line. The objective is to make Wb2 arrive at "Wb2 send GET [wb2_origin]/ENC(R)
at the end.
There are a few moving pieces and I suggest a solution like this.
- The keep-web server writes the redirect URL's query part with proper percent-encoding for the queries (i.e. the usual way), even if it is meant for just Wb2 (it's easier to trust that the client does the right thing than to rely on a private agreement).
- Without encoding it, the response header
Location: [wb2_origin]/?redirectToDownload=R
may become malformed, because the 'raw' pathR
is basically arbitrary bytes (may contain delimiters&
and#
, unencoded space and non-ASCII characters, etc.) - Let's call this function that takes
R
to the correctly encoded query value asQENC()
andQ = QENC(R)
. We are saying that the server should sendLocation: [wb2_origin]/?redirectToDownload=QENC(R)
- Without encoding it, the response header
- On the Wb2 side there are two components
- The redirect handler should take the query parameter input (
[...]?redirectToDownload=Q
), decode to recoverR = QDEC(Q)
, encodeR
using a suitable percent-encoding for pathENC(R)
, and send the client to the target at[base url]/ENC(R)
.- The re-encoding is necessary because in general
ENC(R) != Q
;Q
(the path encoded for query part) may legally contain?
which would prematurely terminate a path.
- The re-encoding is necessary because in general
- The clipboard link generator starts with the input
P = ENC(R)
and constructs[...]?redirectToDownload=QENC(DEC(P))
- Again the re-encoding is necessary; we can't use
ENC(R)
directly in the query part because a validly-encoded path may contain unencoded&
that prematurely terminates a query parameter, among other failure modes
- Again the re-encoding is necessary; we can't use
- The redirect handler should take the query parameter input (
(A GitHub PR will be submitted soon)
Related issues
Updated by Zoë Ma 4 months ago
The GitHub PR is at https://github.com/arvados/arvados/pull/253
Updated by Peter Amstutz 4 months ago
- Target version set to Development 2024-08-07 sprint
Updated by Peter Amstutz 4 months ago
- Related to Feature #21941: Context menu option on collection files to get a link with the PDH and link directly to keep-web added
Updated by Peter Amstutz 4 months ago
I used this filename as a stress test:
arv-mount ! - @ - % - $ - 25 - ^ - x%x - * - ( - ) - [ - ]=|~'`" ?.txt
Here's how workbench "copy to clipboard" encoded it:
Here's the keep-web URL I landed at:
I confirmed that, in a private browsing window, I can start from either of the above URLs and arrive at a login screen and then be redirected to the file correctly.
So I think this is good to go.