Project

General

Profile

Actions

Bug #11168

closed

[API] Use JSON instead of YAML for serialized fields in database

Added by Tom Clegg about 7 years ago. Updated about 7 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
API
Target version:
Story points:
2.0

Description

Currently hashes (like log properties) and arrays (like api_client_authorization scopes) are encoded in YAML, which is much slower than Oj.

YAML has some features that JSON is missing, but we don't want them; in fact, they get in our way (like in #6347).

If we store JSON, and tell PostgreSQL ≥9.3 that we are doing so, we can do queries on serialized fields. https://www.postgresql.org/docs/9.6/static/datatype-json.html

Implementation notes

Migration:
  • It's possible to do the up-migration in the background while the new server is running. We can detect format when loading, and deserialize accordingly: JSON starts with "{" or "[", YAML starts with "---".
  • However, for a downgrade, a full down-migration would need to finish before the old version could work.
  • In this version we won't bother migrating existing records -- we'll just use JSON in new/updated rows.
Changing column types:
  • PostgreSQL can help us more if we use a json or jsonb column type for serialized fields -- but this can will be deferred to a separate story.

Evaluation

After this change is deployed, we should collect some statistics/graphs about real-world performance impact. The biggest impact will probably be on API response times for "list" actions on container_requests, containers, jobs, pipeline instances, and logs.


Subtasks 1 (0 open1 closed)

Task #11195: Review 11168-serialize-jsonResolvedPeter Amstutz02/24/2017Actions

Related issues

Related to Arvados - Idea #4019: [API] Support query of "properties" field on objectsResolvedPeter Amstutz12/12/2017Actions
Related to Arvados - Idea #11807: [API] Migrate old serialized database content from YAML to JSONResolvedTom Clegg06/05/2017Actions
Actions

Also available in: Atom PDF