Story #11908

Migrate Collections.properties to JSONB

Added by Tom Morris 5 months ago. Updated 2 days ago.

Status:In ProgressStart date:06/27/2017
Priority:NormalDue date:
Assignee:Tom Clegg% Done:

100%

Category:API
Target version:2017-11-22 Sprint
Story points0.5Remaining (hours)0.00 hour
Velocity based estimate0 days

Description

Handle (deserialize) both YAML and JSON and convert column type to JSONB


Subtasks

Task #11933: Review 11908-properties-column-jsonResolved


Related issues

Related to Arvados - Story #4019: [API] Support query of "properties" field on objects New
Related to Arvados - Story #11884: Convert Collection properties column type to JSONB Duplicate
Blocked by Arvados - Story #11807: [API] Migrate old serialized database content from YAML t... Resolved 06/05/2017

History

#1 Updated by Tom Morris 5 months ago

  • Description updated (diff)
  • Target version set to Arvados Future Sprints
  • Story points set to 2.0

#2 Updated by Tom Morris 5 months ago

  • Target version changed from Arvados Future Sprints to 2017-07-19 sprint

#3 Updated by Tom Morris 5 months ago

  • Assignee set to Tom Clegg

#4 Updated by Tom Clegg 5 months ago

  • Status changed from New to In Progress

Changing the column type in-place is trivial ("alter table foo alter column bar type jsonb using bar::jsonb"). Unfortunately, the fulltext index includes the old column so it gets dropped in the process, and has to be rebuilt.

The good news is that I ran into the fulltext index migration bug (existing index not detected, migration fails) and fixed it.

ActiveRecord::StatementInvalid: PG::DuplicateTable: ERROR:  relation "collections_full_text_search_idx" already exists

11908-properties-column-json @ bb821d03eb10ddcc7822fac51a565d1a11082ebc

#5 Updated by Radhika Chippada 4 months ago

  • Do we not need to drop the full text index on collection before changing column type back in down migration?
  • Do we not want to update workflows -> definition column?

#6 Updated by Tom Clegg 4 months ago

Radhika Chippada wrote:

  • Do we not need to drop the full text index on collection before changing column type back in down migration?

On my system the down-migration worked without dropping/recreating the index. I added a comment to the up-migration with the postgresql error it avoids.

  • Do we not want to update workflows -> definition column?

Not here/now (this is just collections.properties, to support tags) and perhaps not ever (IIRC we decided to store literal YAML there, instead of using a serialized field, in order to preserve key order, formatting, comments).

11908-properties-column-json @ ebc65675cecdf25ca11a86f789bfb23b600875b8

#7 Updated by Radhika Chippada 4 months ago

On my system the down-migration worked without dropping/recreating the index. I added a comment to the up-migration with the postgresql error it avoids.

Down migration worked for me as well. I also switched to master branch after down migration and added tags to a collection and everything worked fine.

LGTM

#8 Updated by Tom Clegg 4 months ago

  • Category set to API
  • Target version changed from 2017-07-19 sprint to Arvados Future Sprints
  • Story points changed from 2.0 to 0.5

Merge is blocked on PostgreSQL 9.4 dependency.

#9 Updated by Tom Morris 19 days ago

  • Target version changed from Arvados Future Sprints to 2017-11-08 Sprint

#10 Updated by Tom Clegg 11 days ago

  • Target version changed from 2017-11-08 Sprint to 2017-11-22 Sprint

#11 Updated by Tom Clegg 2 days ago

As a reminder, in July #11807 did (what we expect to be) the slowest part of the yaml-to-json migration, i.e., the jobs table, and that took 3.5 minutes on qr1hi.

This branch does a smaller part of the yaml-to-json migration. It also changes the column type of a single column (collections.properties) that is typically null in most rows, and then it has to regenerate the fulltext index. We don't having timing estimates for that yet.

Also available in: Atom PDF