Project

General

Profile

Actions

Bug #22660

closed

a-c-r can leak credentials if present in the git repo URL

Added by Peter Amstutz 3 days ago. Updated 1 day ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
CWL
Target version:
Story points:
-
Release relationship:
Auto

Description

Reported by user: some git services use HTTP basic auth with the username and password or API token embedded in the repo URL. The arvados-cwl-runner feature that records git metadata will leak this by accident. If the URL starts with http or https it should filter out the username/password portion of the URL.


Subtasks 1 (0 open1 closed)

Task #22664: Review 22660-acr-cred-leakResolvedPeter Amstutz03/13/2025Actions
Actions #1

Updated by Peter Amstutz 3 days ago

  • Position changed from -944463 to -944451
Actions #2

Updated by Peter Amstutz 3 days ago

  • Description updated (diff)
Actions #3

Updated by Peter Amstutz 3 days ago

  • Release set to 75
Actions #4

Updated by Peter Amstutz 3 days ago

22660-acr-cred-leak @ cd0639a7128e46709af34c7e56f24d082195dc25

developer-run-tests: #4694

  • All agreed upon points are implemented / addressed. Describe changes from pre-implementation design.
    • Sanitizes the git remote URL so it doesn't contain a username or password.
  • Anything not implemented (discovered or discussed during work) has a follow-up story.
    • n/a
  • Code is tested and passing, both automated and manual, what manual testing was done is described.
    • Added a unit test for the sanitize_url function. Testing the get_git_info method is a bit more annoying since it run git in a subprocess and would require more mocking.
  • New or changed UX/UX and has gotten feedback from stakeholders.
    • n/a
  • Documentation has been updated.
    • n/a
  • Behaves appropriately at the intended scale (describe intended scale).
    • no change in scale
  • Considered backwards and forwards compatibility issues between client and server.
    • no compatibility issues, the arv:gitOrigin field is purely informational
  • Follows our coding standards and GUI style guidelines.
    • yes (I even went and cleaned up util.py a little bit to fit PEP 8 better)

It was privately reported by a user a-c-r was leaking github tokens by copying the remote-url directly into the workflow metadata, as one way of providing a github token is write it as:

https://x-access-token:blahblahblah@github.com/foo/bar.git

This branch adds a sanitize step to remove the username/password section of the netloc part of the URL, if present.

Actions #5

Updated by Peter Amstutz 3 days ago

  • Status changed from New to In Progress
Actions #6

Updated by Peter Amstutz 3 days ago

  • Subtask #22664 added
Actions #7

Updated by Brett Smith 2 days ago

Peter Amstutz wrote in #note-4:

22660-acr-cred-leak @ cd0639a7128e46709af34c7e56f24d082195dc25

I just don't understand why we care about the URL scheme. Can you posit a scenario where logging the credentials is ever the right thing to do? Is there any reason the function couldn't just be this?

def sanitize_url(url):
    parts = urllib.parse.urlparse(url)
    if parts.port is None:
        netloc = parts.hostname
    else:
        netloc = f'{parts.hostname}:{parts.port}'
    return urllib.parse.urlunparse(parts._replace(netloc=netloc))

Thanks.

Actions #8

Updated by Peter Amstutz 2 days ago

Brett Smith wrote in #note-7:

Peter Amstutz wrote in #note-4:

22660-acr-cred-leak @ cd0639a7128e46709af34c7e56f24d082195dc25

I just don't understand why we care about the URL scheme. Can you posit a scenario where logging the credentials is ever the right thing to do? Is there any reason the function couldn't just be this? [...]

Mainly I wasn't sure how urlparse would behave with SSH connection strings. The answer seems to be that they are not valid URLs but it doesn't throw an error either.

>>> urllib.parse.urlparse("git@github.com:arvados/arvados.git")
ParseResult(scheme='', netloc='', path='git@github.com:arvados/arvados.git', params='', query='', fragment='')

That's fine because I'm pretty sure you can't hide credentials in a SSH connection string. I also added a test case to make sure ssh connection strings don't get mangled.

22660-acr-cred-leak @ 0bf6de85d8751c5ccc0a26b855bc588f5d68e258

developer-run-tests: #4697

Actions #9

Updated by Brett Smith 2 days ago

Peter Amstutz wrote in #note-8:

22660-acr-cred-leak @ 0bf6de85d8751c5ccc0a26b855bc588f5d68e258

LGTM, thanks.

Actions #10

Updated by Peter Amstutz 1 day ago

  • Status changed from In Progress to Resolved
Actions

Also available in: Atom PDF