Story #12960

[API] Migrate remaining "properties" fields to jsonb

Added by Peter Amstutz about 1 year ago. Updated 7 months ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
-
Target version:
Start date:
06/07/2018
Due date:
% Done:

100%

Estimated time:
(Total: 0.00 h)
Story points:
1.0
Release:
Release relationship:
Auto

Description

Migrate remaining "properties" fields to jsonb in order to support subproperty filters.

Tables:

  • container_requests
  • links
  • nodes

We should also consider adding properties to these records types that don't currently have it:

  • groups
  • users

Don't migrate legacy tables:

  • logs
  • pipeline_instances
  • humans
  • specimens
  • traits

Subtasks

Task #13582: Review 12960-migrate-propertiesResolvedPeter Amstutz


Related issues

Related to Arvados - Feature #13562: [API] Add JSONB properties field on groupsResolved2018-06-08

Associated revisions

Revision 1e03ea47
Added by Peter Amstutz 8 months ago

Merge branch '12960-migrate-properties' closes #12960

Arvados-DCO-1.1-Signed-off-by: Peter Amstutz <>

History

#1 Updated by Peter Amstutz about 1 year ago

  • Status changed from New to In Progress

#2 Updated by Peter Amstutz about 1 year ago

  • Status changed from In Progress to New

#3 Updated by Peter Amstutz about 1 year ago

  • Subject changed from Migrate remaining "properties" fields to jsonb to [API] Migrate remaining "properties" fields to jsonb

#4 Updated by Peter Amstutz about 1 year ago

  • Description updated (diff)

#5 Updated by Tom Morris 10 months ago

  • Tracker changed from Bug to Story

#6 Updated by Peter Amstutz 10 months ago

  • Description updated (diff)

#8 Updated by Peter Amstutz 10 months ago

  • Description updated (diff)

#9 Updated by Peter Amstutz 9 months ago

Question about migrating other serialized fields such as mounts, etc → we don't plan on querying on it, the subproperty query API exposed by Arvados doesn't support deep queries. Are there other benefits to migrating serialized columns to jsonb, for example, is jsonb more compact or faster to retrieve in the database?

#10 Updated by Lucas Di Pentima 9 months ago

JSON pros over JSONB
  • Faster writes (no conversion process involved)
  • Input text is saved as is, so things like key ordering and indentation are preserved
JSONB pros over JSON
  • Duplicated attributes get deduplicated & whitespaces removed (potential storage space savings)
  • Significantly faster processing
  • Indexable / searchable

From what I could read on docs and forum comments, the general rule for deciding whether to use JSON over JSONB is that if data will be processed on the application itself, JSON would be fine, otherwise it is probably more convenient to use JSONB.

I’ve found mixed references about which type is more compact/faster to retrieve, no clear answer to that.

#11 Updated by Lucas Di Pentima 9 months ago

  • Related to Feature #13562: [API] Add JSONB properties field on groups added

#12 Updated by Tom Morris 9 months ago

  • Target version changed from To Be Groomed to Arvados Future Sprints
  • Story points set to 1.0

#13 Updated by Peter Amstutz 9 months ago

I should mention that indexing properties on container_requests is necessary to fully implement the Workflow Execution Service (WES) spec, which includes filtering on key-value tags.

#14 Updated by Tom Morris 9 months ago

  • Target version changed from Arvados Future Sprints to 2018-06-20 Sprint

#15 Updated by Peter Amstutz 9 months ago

  • Assigned To set to Peter Amstutz

#16 Updated by Peter Amstutz 9 months ago

12960-migrate-properties @ a518eb2b3b743259fe51b1e61d2648677e583cac

Convert links.properties and container_requests.properties, nodes.properties, and nodes.info to jsonb and add gin index.

#17 Updated by Peter Amstutz 9 months ago

  • Status changed from New to In Progress

#18 Updated by Tom Clegg 9 months ago

LGTM.

Strangely, the db:migrate output is verbose about the container_requests.properties iteration but silent about the others.

== 20180607175050 PropertiesToJsonb: migrating ================================
-- execute("DROP INDEX IF EXISTS container_requests_full_text_search_idx")
   -> 0.0004s
-- execute("CREATE INDEX container_requests_full_text_search_idx ON container_requests USING gin(to_tsvector('english', coalesce(uuid,'') || ' ' || coalesce(owner_uuid,'') || ' ' || coalesce(modified_by_client_uuid,'') || ' ' || coalesce(modified_by_user_uuid,'') || ' ' || coalesce(name,'') || ' ' || coalesce(description,'') || ' ' || coalesce(properties::text,'') || ' ' || coalesce(state,'') || ' ' || coalesce(requesting_container_uuid,'') || ' ' || coalesce(container_uuid,'') || ' ' || coalesce(runtime_constraints::text,'') || ' ' || coalesce(container_image,'') || ' ' || coalesce(environment::text,'') || ' ' || coalesce(cwd,'') || ' ' || coalesce(command::text,'') || ' ' || coalesce(output_path,'') || ' ' || coalesce(filters,'') || ' ' || coalesce(scheduling_parameters::text,'') || ' ' || coalesce(output_uuid,'') || ' ' || coalesce(log_uuid,'') || ' ' || coalesce(output_name,'')))")
   -> 0.0083s
== 20180607175050 PropertiesToJsonb: migrated (0.3391s) =======================

...but test.log shows all of the changes, so I guess it's ok.

Migrating to PropertiesToJsonb (20180607175050)
   (0.2ms)  BEGIN
   (0.3ms)  DROP INDEX IF EXISTS nodes_full_text_search_idx
   (55.3ms)  ALTER TABLE nodes ALTER COLUMN properties TYPE jsonb USING properties::jsonb
   (0.5ms)  CREATE INDEX nodes_index_on_properties ON nodes USING gin (properties)
   (0.1ms)  DROP INDEX IF EXISTS nodes_full_text_search_idx
   (390.1ms)  ALTER TABLE nodes ALTER COLUMN info TYPE jsonb USING info::jsonb
   (0.4ms)  CREATE INDEX nodes_index_on_info ON nodes USING gin (info)
   (0.3ms)  DROP INDEX IF EXISTS container_requests_full_text_search_idx
   (61.6ms)  ALTER TABLE container_requests ALTER COLUMN properties TYPE jsonb USING properties::jsonb
   (0.4ms)  CREATE INDEX container_requests_index_on_properties ON container_requests USING gin (properties)
   (0.1ms)  DROP INDEX IF EXISTS links_full_text_search_idx
   (91.3ms)  ALTER TABLE links ALTER COLUMN properties TYPE jsonb USING properties::jsonb
   (0.4ms)  CREATE INDEX links_index_on_properties ON links USING gin (properties)
   (0.2ms)  DROP INDEX IF EXISTS container_requests_full_text_search_idx
   (1.5ms)  CREATE INDEX container_requests_full_text_search_idx ON container_requests USING gin(to_tsvector('english', coalesce(uuid,'') || ' ' || coalesce(owner_uuid,'') || ' ' || coalesce(modified_by_client_uuid,'') || ' ' || coalesce(modified_by_user_uuid,'') || ' ' || coalesce(name,'') || ' ' || coalesce(description,'') || ' ' || coalesce(properties::text,'') || ' ' || coalesce(state,'') || ' ' || coalesce(requesting_container_uuid,'') || ' ' || coalesce(container_uuid,'') || ' ' || coalesce(runtime_constraints::text,'') || ' ' || coalesce(container_image,'') || ' ' || coalesce(environment::text,'') || ' ' || coalesce(cwd,'') || ' ' || coalesce(command::text,'') || ' ' || coalesce(output_path,'') || ' ' || coalesce(filters,'') || ' ' || coalesce(scheduling_parameters::text,'') || ' ' || coalesce(output_uuid,'') || ' ' || coalesce(log_uuid,'') || ' ' || coalesce(output_name,'')))
  SQL (0.2ms)  INSERT INTO "schema_migrations" ("version") VALUES ($1)  [["version", "20180607175050"]]
   (3.4ms)  COMMIT

Also noticed it's impossible to db:migrate a test db, because test fixtures are still being inserted as YAML. But db:drop + db:setup + db:fixtures:load works ok.

#20 Updated by Peter Amstutz 9 months ago

It is also not logging ALTER TABLE. But I agree it seems to be an ActiveRecord quirk rather than an actual problem. I will merge if Jenkins is happy.

#21 Updated by Peter Amstutz 9 months ago

Oops, jenkins is not happy. Trivial different in serialization from jsonb failed a test. Running again @ cefddd8efdc88c3d33e5aad4c02e96c64cbe8242

https://ci.curoverse.com/job/developer-run-tests/746/

#22 Updated by Peter Amstutz 8 months ago

  • Status changed from In Progress to Resolved
  • % Done changed from 0 to 100

#24 Updated by Tom Morris 7 months ago

  • Release set to 13

Also available in: Atom PDF