New Schema Specs for New Replication


#1

##TL;DR

If your PR fails the spec here you have added a migration with a timestamp that would cause inconsistent migration ordering on release upgrades, regenerate the migration timestamp and the spec should pass.

If your PR fails the spec added here and you’ve changed the schema by adding a migration you can probably just run bundle exec rake evm:db:write_schema and it should be resolved.
If you didn’t create a migration and your PR failed the schema spec, something else has changed the database which will need more investigation.

##Some Background:

Recently we have been working on adding a new solution for database replication called pglogical.

The old solution, rubyrep, used database triggers to record changes which were then replicated. This allowed for some flexibility in the schema that will no longer work with pglogical.

##The problem:

pglogical requires that the schemas of the source and destination databases be exactly the same. During testing we discovered that this includes column ordering as both the initial sync (which uses the PostgreSQL COPY command) and the replication apply process do not ship column metadata with the row data; this means that the data will be inserted by order only. If the columns in a table are in different orders on the two databases we could get data in the incorrect columns if the types are compatible or pglogical will log an “unrecoverable error” during the initial sync if they are not.

We have seen this happen in two different ways

  1. Migrations which add columns to tables running in the incorrect order
  • This can happen when a migration is created in a feature branch which adds a column, but that branch stays unmerged long after the date on the migration timestamp
  • In the mean time someone else merges a migration with a later timestamp which also adds a column to the same table.
  • If the second migration gets into a release before the first, existing databases will run the two in reverse timestamp order, creating an inconsistent schema between the migrated database and one on a newly built appliance
  1. Some external change alters the order a migration adds columns
  • This commit on Rails changed the ordering of the _id and _type columns that make up a polymorphic relation

##Our Solution:

We have created some additional spec tests which will try to detect when these types of situations could have occurred.

Issue number 1 above will be addressed by PR #8337 which adds a spec that will fail if a migration is added that is in the range of previously released migrations.
If your PR fails this spec, the timestamp on your migration must be regenerated to a time later than the most recently released migration (listed in spec/replication/util/data/darga_migrations).

Issue number 2 above will be detected by the changes in PR #8380 which creates db/schema.yml which stores a YAML representation of our database schema as we expect it to look when properly migrated. The PR also adds a spec which will compare the migrated test database with the expected database in db/schema.yml. This would have detected the change made to Rails which changed the column ordering in the PR which moved us from Rails 4 to Rails 5.

Two rake tasks have also been added to conveniently interact with this file (as it is quite large).

  • evm:db:check_schema - tells you if the current schema matches the file
  • evm:db:write_schema - writes the current schema to the db/schema.yml file

Both of these tasks can be used with different values of RAILS_ENV to check or write different databases.

I am currently working on a tool to fix databases that have this issue as a follow up to #8380, but that has proven to be a bit more difficult because PostgreSQL doesn’t have support for changing column ordering https://wiki.postgresql.org/wiki/Alter_column_position


Replication test setup causing shallow git repos
#2

Some of the links in this post seem to be broken.

The specs are now located in the manageiq-schema repo, specifically the migration order spec and the schema structure spec changed locations.

The links here are to specific commits so hopefully they will be more useful.


#3

The new incantation to regenerate db/schema.yml in manageiq-schema repo:

env RAILS_ENV=test bundle exec rake db:migrate db:write_schema

#4

These specs have been removed as of https://github.com/ManageIQ/manageiq-schema/pull/129 and https://github.com/ManageIQ/manageiq/pull/16488

pglogical version 2.0+ include a commit which accounts for differences in column ordering between source and target databases.