Wednesday, December 2, 2015

Getting started with automating database changes using Flyway


For those of us that strive to get to a "One Click Deploy" - where we press a magical button that automatically deploys all of our changes - one of the stumbling blocks during deployment is if there is a relational database involved where schema and/or data changes still need to be run manually.
Sometimes this is unavoidable, but assuming it's not, how can we achieve a fully automated deployment when we still need to worry about the database? For the trolls out there, don't say "Just use NoSQL".
I spoke about the general idea behind database migration strategies at one of Entelect's dev days, and the concepts are still the same (see the end of this article for some references). Since then, there are two tools that I've become aware of: Liquibase and Flyway. This will go through using Flyway with SQL Server 2014.

Setup Flyway

Flyway is a Java based library, but also comes as a standalone command line tool. I think that means you don't need to install the Java Runtime Environment (JRE) to use it, but I already had it installed so haven't verified that yet.
  1. Go to the Flyway command line page
  2. Download
  3. Unzip to C:\flyway-3.2.1
Since we're using SQL Server, we also need to make sure the right JDBC drivers are available for Flyway to use. Although the documentation does show support for using both the Microsoft and jTDS drivers, I found that the jTDS library doesn't support Windows Authentication mode for SQL Server 2014 (it might for older versions), and I had to download and set up the Microsoft driver in any case:
  1. Go to the Microsoft JDBC driver download page
  2. Download either sqljdbc_6.0.6629.101_enu.exe or the sqljdbc_6.0.6629.101_enu.tar.gz tarball (I chose the latter)
  3. Unzip the tarball
  4. Choose the right JAR depending on Java version and JDBC compliance, I went with sqljdbc42.jar, and copy this into C:\flyway-3.2.1\drivers
  5. Since we need Windows Authentication, copy sqljdbc_auth.dll into a folder on the PATH
    • I copied it into C:\flyway-3.2.1 since I'll run the command line tool from that location
There are a few configuration options available, but for now we'll just need to setup our connection string and driver in C:\flyway-3.2.1\conf\flyway.conf:
Technically you shouldn't need to specifiy the driver explicitly, but I prefer being explicit to avoid surprises. In the above connection, I'm connecting to a database FlywayDemo on a named instance called sql2014, and making use of integrated security (Windows Authentication).
You should now be able to run the command below to verify that Flyway can connect to the DB:
404 - Image Not Found
The above shows that it can connect to the database, and that there were no migrations found (we will start with that next). You shouldn't get any errors at this stage, though you may need to configure your database to allow TCP connections if it hasn't been configured to do so already: 
404 - Image Not Found


If you're lucky enough to work on a brand new project with a brand new database, you can just create an empty database, and each change to the database can be created as a new script, each run in order.
But most of us have an existing database we want to work from. For this demo, I'll just use a simple database with a single table as below:
404 - Image Not Found
We can use Flyway to baseline this version of the database, say, version 1.0:
404 - Image Not Found
This creates a new database table schema_version in the default schema of the database, and tags the database as version 1.0. We could create a backup of this database, and use it as our baseline database to work off of, and when another user restores the database backup, they can use flyway to inspect what version of the database they are using:
404 - Image Not Found

Database migration changes

With Flyway, each change to the database should be scripted as an incremental change. This allows us to migrate database changes forward from whatever current version our database is at. Flyway uses specifc conventions for database migration scripts, and we'll follow the default. Scripts are to be placed in the folder C:\flyway-3.2.1\sql, and follow the naming convention prefixVERSIONseparatorDESCRIPTIONsuffix, e.g. V1.1__Some description.sql.
Lets say we want to change our database structure to extract the Type column of the Person table into its own table. This would involve:
  1. Creating a new table, PersonType
  2. Inserting necessary data into PersonType
  3. Adding a new PersonTypeId foreign key to Person
  4. Updating the Person data for the new PersonTypeId
  5. Dropping the old Type column on the Person table
We could lump all of these in a new V1.1 script, however the recommended practice is to split each DDL change into its own version, since some database vendors don't handle transactions very well with DDL statements, and Flyway tries to wrap each migration in a transaction so if any migration fails it doesn't leave the database in an inconsistent state.
Instead, we'll just make each change it's own script, by default:
  • V1.1__Create PersonType.sql
  • V1.2__Populate PersonType.sql
  • V1.3__Create Person.PersonTypeId.sql
  • V1.4__Create FK_Person_PersonType.sql
  • V1.5__Update Person.PersonTypeId.sql
  • V1.6__Drop Person.Type.sql
  • V1.7__Make Person.PersonTypeId NOT NULL.sql
After we've created these scripts in the sql folder, if we re-run flyway info, we will see that we now have pending changes that need to be applied to the baseline:
404 - Image Not Found
We can then use Flyway to migrate our database to include all of these changes:
404 - Image Not Found
404 - Image Not Found
You'll notice that the version and description columns are pulled from the filename convention. The checksum column is actually a checksum of the contents of the migration script. That is to avoid changes to scripts that were already run in - instead of changing existing scripts, prefer creating a new script with a new version to do the necessary change. If we did modify a script, and run validate, we should see the issue being highlighted:
404 - Image Not Found
Sometimes though, it may make sense to modify an existing script, as long as it's functionality doesn't change - lets say the script takes forever to run the migration, and we find a way to optimize it, then it makes sense to change the script later on. In this case, we can repair the metadata stored in the version table using the repair command:
404 - Image Not Found

Development time considerations

This way of working does mean you'll need to be more vigilant with how you make database changes. Changes should be done in small incremental changes. An issue when working on teams may be deciding on how to allocate version numbers - if we're all working in parallel, we may all choose to pick V1.1, whereas my changes might only be deployed after my colleague's changes, meaning his should be V1.1 and mine V1.2.
If you were using feature branches, then you could version your changes for your feature as V1, V2, etc. in a separate folder maybe, and only when merging with the common source repository, would you move the files into the sql migration folder with their actual version numbers, e.g. we merge the development branch, notice that someone has implemented V1.6, we can then decide to rename our scripts V1.7.1 and V1.7.2, etc.

Continuous Delivery

What this all enables is a truly continuous delivery option. This would be the typical flow:
  1. Developer checks out latest version of the code
  2. Developer gets latest database setup
    • either restore latest database backup, and migrate pending changes,
    • or rebuild the whole database using Flyway
  3. Developer makes changes for their feature, both code and migration scripts
  4. Developer pushes all changes to source control
  5. Continuous Integration build runs
    • gets latest code
    • compiles
    • runs tests
    • gets Integration Test database setup as above in (2) using the tooling
    • runs integration tests
  6. Continuous Deployment build runs
    • migrates database
    • deploys artifacts
This could be setup to happen automatically for every commit, maybe to deploy to a Quality Assurance or User Acceptance Testing environment (or both). This could even be done to drive Production environment deployments, though that may be more complicated, and may be better done by clicking a button to initiate the deployment rather than automatically deploying after a code change.

Wrap up

Continuous delivery is a part of the devops community that's getting a lot of focus in recent times. Hopefully this article has inspired someone to automate their own deployment process. For those completely new to the database side of this, I recommend reading the links below. For those that aren't new to this, have you tried this on an actual project, and if so maybe share your war stories in the comments! :)