Project

General

Profile

Actions

Bug #22003

closed

Client-side redirect in WB2 may target incorrect URL if path contains special characters

Added by Zoë Ma about 1 month ago. Updated 4 days ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
Keep
Story points:
-
Release:
Release relationship:
Auto

Description

Symptom

The following URL
https://workbench.pirca.arvadosapi.com/?redirectToPreview=/c=pirca-4zz18-w0rawdttbs8qz38/test%20-%20%2527/
is generated by the "Copy link to clipboard" action in the file listing of a collection in Wb2 (pirca-4zz18-w0rawdttbs8qz38) which contains the path segment test - %27. This redirect is handled by navigating to [keep-web base url]/test - %27/, and the path element will be interpreted as test - ', mapped to a non-existent directory in the collection, which incurs a Not Found.

Cause

In Workbench2, when the URL has query parameters redirectToPreview=... or redirectToDownload=..., the client-side app will perform a "redirect" by setting the window.location.href to the path passed as value to one of those query parameters.

There are mainly two ways how wb2 may receive such an URL with redirect parameter.

  1. Generated by Wb2 internally: when copying the URL of a file/directory in a collection to the clipboard.
  2. Served by keep-web when the client tries to access a collection without right credentials: if keep-web thinks the UA is a browser, it serves a 303 See Other with Location: header set to a URL at the Wb2 origin with the redirect parameter.

In the 1st case, Wb2 generates the parameter value by taking the <D:href> element's value (a URL) from WebDAV responses and extracting the path element. The <D:href> value is already percent-encoded (RFC 4918), and this is indeed the case in the WebDAV response the server generates. It's this encoded path value that Wb2 takes and uses to construct the redirect target URL.

Suppose the underlying path is R that refers to a file in a collection. It undergoes the transformations

1a R                                       (raw, FS path from manifest)
1b <D:href>ENC(R)</D:href>                 (WebDAV response body, up to XML entity transformation which is transparent to this discussion)
   ----------------
1c [wb2_origin]/?redirectToDownload=ENC(R) (URL copied to clipboard by Wb2, other query params not shown)
1d GET decodeURIComponent(ENC(R))          (URL is pasted in browser; Wb2 redirect handler runs)

Here, the function ENC(R) is essentially Go's (*url.URL) EscapedPath() method and decodeURIComponent is the standard JS function. In general we have R = decodeURIComponent(ENC(R)) so Wb2 is seen redirecting to the raw path R as it is, which in general will not land us at the resource identified by R because the web server locates the resource (dereferencing) by decoding the argument of GET. This is why we get the error in with the constructed test case.

In the 2nd case, the server tries to second-guess (some older version of?) Wb2 (source:services/keep-web/handler.go@7f01a21e#L514), and intentionally sends the URL (in Location: header), without percent-encoding the query parameter.

2a R                                           (raw, FS path from the manifest)
----------------
2b Location: [wb2_origin]/?redirecToDownload=R (303 See Other response)
2c GET decodeURIComponent(R) (Wb2 redirect handler)

So again the server will not get the client to understand what path is to be requested (assuming that the client even gets the full raw path R and pass it to the decoding routine).

(For the behavior of Wb2 client-side redirect handler, see source:services/workbench2/src/common/redirect-to.ts@7f01a21e#L42)

This shows that the reason why it may work for many paths R (including test cases) is that none of the various rounds of encoding and decoding of R actually change it. But in general the web server always gets back some unexpected input.

Proposed solution

In the above boxes the steps below the horizontal line are within our control. In particular, in the first box Wb2 is in control of everything below the line. The objective is to make Wb2 arrive at "Wb2 send GET [wb2_origin]/ENC(R) at the end.

There are a few moving pieces and I suggest a solution like this.

  1. The keep-web server writes the redirect URL's query part with proper percent-encoding for the queries (i.e. the usual way), even if it is meant for just Wb2 (it's easier to trust that the client does the right thing than to rely on a private agreement).
    • Without encoding it, the response header Location: [wb2_origin]/?redirectToDownload=R may become malformed, because the 'raw' path R is basically arbitrary bytes (may contain delimiters & and #, unencoded space and non-ASCII characters, etc.)
    • Let's call this function that takes R to the correctly encoded query value as QENC() and Q = QENC(R). We are saying that the server should send Location: [wb2_origin]/?redirectToDownload=QENC(R)
  2. On the Wb2 side there are two components
    • The redirect handler should take the query parameter input ([...]?redirectToDownload=Q), decode to recover R = QDEC(Q), encode R using a suitable percent-encoding for path ENC(R), and send the client to the target at [base url]/ENC(R).
      • The re-encoding is necessary because in general ENC(R) != Q; Q (the path encoded for query part) may legally contain ? which would prematurely terminate a path.
    • The clipboard link generator starts with the input P = ENC(R) and constructs [...]?redirectToDownload=QENC(DEC(P))
      • Again the re-encoding is necessary; we can't use ENC(R) directly in the query part because a validly-encoded path may contain unencoded & that prematurely terminates a query parameter, among other failure modes

(A GitHub PR will be submitted soon)


Related issues

Related to Arvados - Feature #21941: Context menu option on collection files to get a link with the PDH and link directly to keep-webResolvedPeter AmstutzActions
Actions #2

Updated by Peter Amstutz about 1 month ago

  • Target version set to Development 2024-08-07 sprint
Actions #3

Updated by Peter Amstutz about 1 month ago

  • Assigned To set to Peter Amstutz
Actions #5

Updated by Peter Amstutz 17 days ago

  • Related to Feature #21941: Context menu option on collection files to get a link with the PDH and link directly to keep-web added
Actions #6

Updated by Peter Amstutz 17 days ago

I used this filename as a stress test:

arv-mount ! - @ - % - $ - 25 - ^ - x%x - * - ( - ) - [ - ]=|~'`" ?.txt

Here's how workbench "copy to clipboard" encoded it:

https://172.17.0.2/?redirectToPreview=/c=x94bf-4zz18-aluuqxi4s8o993z/arv-mount%20!%20-%20%40%20-%20%23%20-%20%24%20-%20%25%20-%20%5E%20-%20%26%20-%20*%20-%20(%20-%20)%20-%20%5B%20-%20%5D=%7C~'%60%22%20%3F.txt

Here's the keep-web URL I landed at:

https://172.17.0.2:9002/c=x94bf-4zz18-aluuqxi4s8o993z/arv-mount%20%21%20-%20@%20-%20%23%20-%20$%20-%20%25%20-%20%5E%20-%20&%20-%20%2A%20-%20%28%20-%20%29%20-%20%5B%20-%20%5D=%7C~%27%60%22%20%3F.txt

I confirmed that, in a private browsing window, I can start from either of the above URLs and arrive at a login screen and then be redirected to the file correctly.

So I think this is good to go.

Actions #7

Updated by Peter Amstutz 15 days ago

  • Status changed from New to Resolved
Actions #8

Updated by Peter Amstutz 4 days ago

  • Release set to 70
Actions

Also available in: Atom PDF