Credential storage » History » Revision 6

« Previous | Revision 6/7 (diff) | Next »
Peter Amstutz, 06/14/2024 08:09 PM

Credential storage


In order to implement Objects as pseudo-blocks in Keep, we need a way to store credentials so that Arvados can authenticate to other systems, e.g. AWS S3.

The current system for managing secrets is specific to workflows and deletes the secret as soon as the workflow is finished. However, we require a credential storage system that can be accessed by keepstore.

User perspective:

Want to be able to manage credentials in workbench, and then Arvados services that need it can look it them up. The motivating use case is AWS credentials that have a key id/key secret pair (much like arvados API key uuid / secret) so that we can easily access objects in external S3 buckets.


  • Secrets should have an id for what type of thing they are, e.g. AWS credentials
  • Secrets should have an optional scope. E.g. want to be able to provide different credentials for different resources, buckets, etc.
  • Should the secret material itself be simple text column or a JSON object? For example AWS secret id/secret is a pair
  • Different users should have different views of what secrets are available based on Arvados permissions. User should be able to share secrets at different levels of access, e.g.
    • can_read -- system services can fetch the credential on behalf of the user, but they cannot fetch it directly through the API
    • can_write -- user can update the credential, but still not read it back
    • can_manage -- user can grant permissions to the credential, but still not read it back
  • Secrets should be write-only as much as possible, system services can retrieve secrets, but users cannot except in special circumstances
    • want a way to use secrets in workflows, which means they can be exposed if developers are careless. This is true of our current secrets support as well (it's inherently impossible to prevent it from being leaked in user-provided code if someone is really trying, but we'll at least be able to keep a record of which workflows accessed those secrets).


Start with our threat model.

These are not passwords, these are credentials that will be provided to other services on behalf of the user, which means we have to be able to get them in the clear, we can't hash them. Unfortunately a google search for "how to store secrets in a database" comes up dozens of pages telling you not to store cleartext passwords and how to hash passwords and not so much advice on how to do what we need to do.

Ways credentials could leak

  • Attacker uses Arvados API as a normal user
    • Should be restricted accessing credentials by normal access controls.
    • As previously noted, if we want to provide credentials to a user-supplied workflows, it is impossible defend against, so we have to exclude consider users who are authorized to use the credentials being able to do anything they want with those credentials from the threat model
  • Attacker uses Arvados API as a superuser
    • Admins can already mostly access anything
    • The existing secret_mounts only makes it inconvenient for admins, if they can access the container's runtime token, they can fetch secret mounts
    • Boxing out admins via the the API is probably possible but may require sealing additional holes (e.g. placing stricter limits on admins accessing API tokens of other users)
  • Attacker gains access to the database
    • Would be able to use SQL to read any column. E.g. currently secret_mounts is not encrypted, so it would be vulnerable.
    • To block this, columns need to be encrypted.
  • Attacker gains access to the node the database is running on
    • Same as remote database access, except attacker additionally has access to the /etc/arvados/config.yml and any credentials kept in there.
  • Attacker can intercept communications with the database and/or API server
    • This is probably game over for our entire security model, not just secrets handling. We rely on TLS to prevent this.


The first rule of security software is don't build it yourself. Need to do some research and see if there's something we could plug in to and make part of our stack.

HashiCorp Vault would have been something to consider but licensing has changed which would require us to use an older version and/or a fork.

There are cloud-specific APIs, this are probably inappropriate unless it's feasible for Arvados to act as a frontend and have pluggable backends.

Updated by Peter Amstutz about 1 month ago · 6 revisions