The Migrationless Migration
I’ve been reading the excellent blog Revision-Zero lately, and just a whole bunch of other relational algebra stuff. If I may impose upon you for a minute, relational algebra is one of those things that, like linear algebra, is a shining intellectual gem. It is at once pretty in its own right but also tremendously practically useful. I strongly suggest that any working programmer should acquaint themselves with it. You should also learn about linear algebra, but that’s another article!
Modularity
In programming we spend a lot of time concerning ourselves with modularity. This quest for modularity is why we divide our code up into lines, blocks, modules, libraries of modules, and even whole programs. It is a solid engineering practice to encapsulate and then “black-boxify” that encapsulated module. Also, you should note that I’m talking aboutmodularity and module in the most general sense. I’m not referring to any specific language-supplied code organization features (e.g. Java packages).
The article above talks about something that I had never considered which is logical data independence. That is, in the same way that changing the internals of one class shouldn’t affect a different class, changing the internals of the schema shouldn’t affect the application. What this means in Rails terms is, could you migrate your Rails databasewithout having to change any application code?
An example
Let’s say that I have a rails schema something like this:
ActiveRecord::Schema.define(version: 20140115195300) do
create_table "suppliers", force: true do |t|
t.string "sname"
t.string "city"
t.integer "status"
t.datetime "created_at"
t.datetime "updated_at"
end
end
At this point let’s say that it is decided that “location” is a better term than “city.” And I add a migration like the following:
class RenameCityToLocation < ActiveRecord::Migration
def change
rename_column :suppliers, :city, :location
end
end
Change made! We’ll have to go and find any uses of city
and update them to use the new argot of the application,location
.
Modularity at the data layer
And that’s just how I thought it must be. But what if, like a private method, we could contain our changes to the database within the database? This is what the idea of logical data independence is getting at. Changes can be made to the schema independently of the application’s code. We ought to be able to run some kind of migration and yet present an identical API to the application.
Let’s check this out:
class RenameSuppliersTableToSuppliersV1 < ActiveRecord::Migration
def up
rename_table :suppliers, :suppliers_v1
rename_column :suppliers_v1, :city, :location
execute <<-SQL
CREATE VIEW suppliers AS
SELECT id, sname, status, created_at, updated_at, location AS city
FROM suppliers_v1
SQL
end
def down
execute 'DROP VIEW suppliers'
rename_table :suppliers_v1, :suppliers
rename_column :suppliers, :location, :city
end
end
There are a couple of moving parts, but the gist is simple. First, we rename the suppliers
table to a whole new table! The new table, suppliers_v1
has a new column, location
, rather than city
. Our next trick is that we create aVIEW
that looks just like the old suppliers
table. The particularly fun part (for me) is the location AS city
which renames the offending column.
But “hey”, you may be saying to yourself, “that’s fine to query the database, but it’ll blow up once we run a .save
or something like that.” Not necessarily. In the above example, I’m using PostgreSQL, which supports a limited form of what’s known as an updatable view. There are a few conditions that need to be met for PostgreSQL to be able to, in effect reverse the column rename, but our example meets them all.
Think about what we’ve done here. The application will be entirely unaware that any underlying database change has occurred!
The example that I gave above is a bit contrived and there are some valid concerns about muddying the schema. After these changes, you’ll be left with a suppliers_v1
table and a suppliers
view. This is not a faithful representation of the application’s data. However, I think it still illustrates the point. What if the database were decoupled from the beginning? What if what the application always saw were just views? In that case, we could present a nice API to the app while keeping the DB just how we like it. We’d need a really good RDMS to keep the updatable view abstraction from being leaky, but otherwise it seems intriguing.