Project

General

Profile

Actions

Idea #12960

closed

[API] Migrate remaining "properties" fields to jsonb

Added by Peter Amstutz almost 7 years ago. Updated over 6 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
-
Target version:
Start date:
06/07/2018
Due date:
Story points:
1.0
Release:
Release relationship:
Auto

Description

Migrate remaining "properties" fields to jsonb in order to support subproperty filters.

Tables:

  • container_requests
  • links
  • nodes

We should also consider adding properties to these records types that don't currently have it:

  • groups
  • users

Don't migrate legacy tables:

  • logs
  • pipeline_instances
  • humans
  • specimens
  • traits

Subtasks 1 (0 open1 closed)

Task #13582: Review 12960-migrate-propertiesResolvedPeter Amstutz06/07/2018Actions

Related issues 1 (0 open1 closed)

Related to Arvados - Feature #13562: [API] Add JSONB properties field on groupsResolvedPeter Amstutz06/08/2018Actions
Actions #1

Updated by Peter Amstutz almost 7 years ago

  • Status changed from New to In Progress
Actions #2

Updated by Peter Amstutz almost 7 years ago

  • Status changed from In Progress to New
Actions #3

Updated by Peter Amstutz almost 7 years ago

  • Subject changed from Migrate remaining "properties" fields to jsonb to [API] Migrate remaining "properties" fields to jsonb
Actions #4

Updated by Peter Amstutz almost 7 years ago

  • Description updated (diff)
Actions #5

Updated by Tom Morris over 6 years ago

  • Tracker changed from Bug to Idea
Actions #6

Updated by Peter Amstutz over 6 years ago

  • Description updated (diff)
Actions #8

Updated by Peter Amstutz over 6 years ago

  • Description updated (diff)
Actions #9

Updated by Peter Amstutz over 6 years ago

Question about migrating other serialized fields such as mounts, etc → we don't plan on querying on it, the subproperty query API exposed by Arvados doesn't support deep queries. Are there other benefits to migrating serialized columns to jsonb, for example, is jsonb more compact or faster to retrieve in the database?

Actions #10

Updated by Lucas Di Pentima over 6 years ago

JSON pros over JSONB
  • Faster writes (no conversion process involved)
  • Input text is saved as is, so things like key ordering and indentation are preserved
JSONB pros over JSON
  • Duplicated attributes get deduplicated & whitespaces removed (potential storage space savings)
  • Significantly faster processing
  • Indexable / searchable

From what I could read on docs and forum comments, the general rule for deciding whether to use JSON over JSONB is that if data will be processed on the application itself, JSON would be fine, otherwise it is probably more convenient to use JSONB.

I’ve found mixed references about which type is more compact/faster to retrieve, no clear answer to that.

Actions #11

Updated by Lucas Di Pentima over 6 years ago

  • Related to Feature #13562: [API] Add JSONB properties field on groups added
Actions #12

Updated by Tom Morris over 6 years ago

  • Target version changed from To Be Groomed to Arvados Future Sprints
  • Story points set to 1.0
Actions #13

Updated by Peter Amstutz over 6 years ago

I should mention that indexing properties on container_requests is necessary to fully implement the Workflow Execution Service (WES) spec, which includes filtering on key-value tags.

Actions #14

Updated by Tom Morris over 6 years ago

  • Target version changed from Arvados Future Sprints to 2018-06-20 Sprint
Actions #15

Updated by Peter Amstutz over 6 years ago

  • Assigned To set to Peter Amstutz
Actions #16

Updated by Peter Amstutz over 6 years ago

12960-migrate-properties @ a518eb2b3b743259fe51b1e61d2648677e583cac

Convert links.properties and container_requests.properties, nodes.properties, and nodes.info to jsonb and add gin index.

Actions #17

Updated by Peter Amstutz over 6 years ago

  • Status changed from New to In Progress
Actions #18

Updated by Tom Clegg over 6 years ago

LGTM.

Strangely, the db:migrate output is verbose about the container_requests.properties iteration but silent about the others.

== 20180607175050 PropertiesToJsonb: migrating ================================
-- execute("DROP INDEX IF EXISTS container_requests_full_text_search_idx")
   -> 0.0004s
-- execute("CREATE INDEX container_requests_full_text_search_idx ON container_requests USING gin(to_tsvector('english', coalesce(uuid,'') || ' ' || coalesce(owner_uuid,'') || ' ' || coalesce(modified_by_client_uuid,'') || ' ' || coalesce(modified_by_user_uuid,'') || ' ' || coalesce(name,'') || ' ' || coalesce(description,'') || ' ' || coalesce(properties::text,'') || ' ' || coalesce(state,'') || ' ' || coalesce(requesting_container_uuid,'') || ' ' || coalesce(container_uuid,'') || ' ' || coalesce(runtime_constraints::text,'') || ' ' || coalesce(container_image,'') || ' ' || coalesce(environment::text,'') || ' ' || coalesce(cwd,'') || ' ' || coalesce(command::text,'') || ' ' || coalesce(output_path,'') || ' ' || coalesce(filters,'') || ' ' || coalesce(scheduling_parameters::text,'') || ' ' || coalesce(output_uuid,'') || ' ' || coalesce(log_uuid,'') || ' ' || coalesce(output_name,'')))")
   -> 0.0083s
== 20180607175050 PropertiesToJsonb: migrated (0.3391s) =======================

...but test.log shows all of the changes, so I guess it's ok.

Migrating to PropertiesToJsonb (20180607175050)
   (0.2ms)  BEGIN
   (0.3ms)  DROP INDEX IF EXISTS nodes_full_text_search_idx
   (55.3ms)  ALTER TABLE nodes ALTER COLUMN properties TYPE jsonb USING properties::jsonb
   (0.5ms)  CREATE INDEX nodes_index_on_properties ON nodes USING gin (properties)
   (0.1ms)  DROP INDEX IF EXISTS nodes_full_text_search_idx
   (390.1ms)  ALTER TABLE nodes ALTER COLUMN info TYPE jsonb USING info::jsonb
   (0.4ms)  CREATE INDEX nodes_index_on_info ON nodes USING gin (info)
   (0.3ms)  DROP INDEX IF EXISTS container_requests_full_text_search_idx
   (61.6ms)  ALTER TABLE container_requests ALTER COLUMN properties TYPE jsonb USING properties::jsonb
   (0.4ms)  CREATE INDEX container_requests_index_on_properties ON container_requests USING gin (properties)
   (0.1ms)  DROP INDEX IF EXISTS links_full_text_search_idx
   (91.3ms)  ALTER TABLE links ALTER COLUMN properties TYPE jsonb USING properties::jsonb
   (0.4ms)  CREATE INDEX links_index_on_properties ON links USING gin (properties)
   (0.2ms)  DROP INDEX IF EXISTS container_requests_full_text_search_idx
   (1.5ms)  CREATE INDEX container_requests_full_text_search_idx ON container_requests USING gin(to_tsvector('english', coalesce(uuid,'') || ' ' || coalesce(owner_uuid,'') || ' ' || coalesce(modified_by_client_uuid,'') || ' ' || coalesce(modified_by_user_uuid,'') || ' ' || coalesce(name,'') || ' ' || coalesce(description,'') || ' ' || coalesce(properties::text,'') || ' ' || coalesce(state,'') || ' ' || coalesce(requesting_container_uuid,'') || ' ' || coalesce(container_uuid,'') || ' ' || coalesce(runtime_constraints::text,'') || ' ' || coalesce(container_image,'') || ' ' || coalesce(environment::text,'') || ' ' || coalesce(cwd,'') || ' ' || coalesce(command::text,'') || ' ' || coalesce(output_path,'') || ' ' || coalesce(filters,'') || ' ' || coalesce(scheduling_parameters::text,'') || ' ' || coalesce(output_uuid,'') || ' ' || coalesce(log_uuid,'') || ' ' || coalesce(output_name,'')))
  SQL (0.2ms)  INSERT INTO "schema_migrations" ("version") VALUES ($1)  [["version", "20180607175050"]]
   (3.4ms)  COMMIT

Also noticed it's impossible to db:migrate a test db, because test fixtures are still being inserted as YAML. But db:drop + db:setup + db:fixtures:load works ok.

Actions #20

Updated by Peter Amstutz over 6 years ago

It is also not logging ALTER TABLE. But I agree it seems to be an ActiveRecord quirk rather than an actual problem. I will merge if Jenkins is happy.

Actions #21

Updated by Peter Amstutz over 6 years ago

Oops, jenkins is not happy. Trivial different in serialization from jsonb failed a test. Running again @ cefddd8efdc88c3d33e5aad4c02e96c64cbe8242

https://ci.curoverse.com/job/developer-run-tests/746/

Actions #22

Updated by Peter Amstutz over 6 years ago

  • Status changed from In Progress to Resolved
  • % Done changed from 0 to 100
Actions #24

Updated by Tom Morris over 6 years ago

  • Release set to 13
Actions

Also available in: Atom PDF