There are a lot of articles and blog posts about the pros and cons of having a layered architecture or an architecture, which result in architecture layers. In such applications you often find some kind of a persistence layer. A commonly mentioned reason of having a persistence layer is that it enables the easy substitution of the underlying database. Is it just one of the “architecture-myths” ? Is this just a commonly held myth or will a layer architecture indeed support cloud-readiness of a large, long running software project?
In this blog post I will explain how I was able to migrate solely the database-system in a large enterprise application.
Cloud-readiness of Monoliths and Databases
We happily consult customers on their way onto the cloud. In one of our projects there is no way to snap one’s fingers and deploy each application inside a cloud. A lot of questions arise when considering cloud migration. The obvious result or answer to that questions is that we have to migrate the existing software to be cloud-ready. What does it mean to be ready for the cloud ? At least it is necessary to analyse your runtime environment and your persistence technology – because that are parts of your application which you want to be scalable in the cloud.
Since the currently used database in our JavaEE-based project is not available as a cloud-native-service, it is needed to be migrated to a cloud-ready database as a part of a cloud-readiness proof of concept. Otherwise it could be possible to adapt a cloud-native-behaviour – but that is not the solution we want to suggest to our customer. If you are going into the cloud you want to use every Cloud-Service as it is supposed to be: scalable ? Out-of-the box ! increase storage ? No problem – also out of the box !
Our customer preferred two database technologies: MySQL and AzureSQL. We figured out that in either case the same migration strategy can be applied
Migrate the Database
A huge application on which at least 20 people work since four years … and I am supposed to migrate it ? How am I going to do that ? The only sensible way I could find, was working step by step to find pitfalls and incompatibilities.
The first step was to modify my development-environment to be compatible to both of the “new” database systems. Having that, the next step was figuring out on which source files I have to modify connection-URLs and drivers.
In our application there is a construct which allows it to generate a SQL-file with DDL statements for the current version of the database schema.
Loading the generated script into MySQL Workbench shows each incompatible statement. It turned out that almost each line had a problem and at least the following migration steps must be applied:
- migrate the usage of sequences to MySQLs auto_increment mechanism
- remove the usage of tablespaces
- migrate incompatible datatypes to datatypes which are compatible to the business-logic
- handle the different precision of datatypes
- differences between the SQL-dialects
After migrating the initial database setup script, I was able to create a compatible Database in MySQL. First success !
Of course it is necessary to apply all those changes to the source code. Almost every change has to be applied to the entity persistence annotations (the JPA annotations). I was glad: our application has a really great way how the Annotation values are filled: in the Screenshot you can see that you can remove or migrate all incompatible changes with a few search and replace patterns. That’s nice if you have to migrate more than 40 entities.
After I made the source code compatible to MySQL, the next thing was migrating the PersistenceUnits configuration. Thanks to JDBC it was possible to put all parameters into the new ConnectionURL. Furthermore it is necessary to change the JPA Database-dictionary property to MySQL.
JPA was configured to connect to the database through a datasource – just as you would expect from a Java EE application. The datasource has to be reconfigured in order to connect to the “new” database system. I had to do this in two application servers because my colleagues were already working on the cloud readiness of the used application server (because the currently used is not cloud ready). In both cases there were no problems using MySQL or AzureSQL as a JTA connected datasource.
Additionally we use EJB Schedulers which are using the database as well. After connecting them to the new datasource I finally was able to build the first version of the POC. After that I got two POCs (one for MySQL and one for AzureSQL) which worked with basic tests.
Settling the doubts – migrating the migration scripts
I wanted to be sure that I did not forgot something important and told another colleague about it. Together we started working on ‘migrate the migration scripts’. You remember the mentioned four years and the 20 developers ? Due to the fact that it is an agile project there are a lot of migration scripts for the application database. We decided to migrate them into the MySQL dialect.
A big list of interesting facts, questions and problems came out of this task. It was worth the effort: we found a lot of things that we used in the past and that can not be used in that way anymore. Some examples are listed below :
- There will be no more merge statements in MySQL.
- Using table reorganization and tablespace character sets will be different
- Again: different precisions, especially in TIMESTAMP
- Each Trigger has to be rewritten
- What about isolation levels ?
We discussed that list with our customer and found out what changes the cloud migration will trigger. Additionally we migrated the test data within the same task. We did this because we needed a running cloud ready implementation and also wanted to check out how to migrate the production database with existing data. Finding the same problems as we expected after changing the source code lead to a reduction of our doubts.
After we had a running POC, I decided to rebase against master branch. And it worked again on the current development state.
Triggers triggering problems ?
There is an important point about migration scripts: How to use triggers ? Is there any business-logic in your database triggers ? If so you might have a big problem – at least during your database migration. Triggers are different for almost each database vendor. For sure at least they can do the same things on each RDBMS, but advanced capabilities are very different. We were glad that we used triggers only to do basic migration jobs during version updates.
There is that myth that you learn in your basic years: it can theoretically be easy to change a database if you have a persistence layer. Experienced developers often smile on that theory and think that this is kind of naive. That’s why I had a lot of doubts about having success in that POC.
Now I am able to say: That is not a myth. It is true if you have an application
- with a high quality codebase
- which takes the principles of a layered architecture seriously
- with a good incremental database migration setup (read more in my colleague Daojun Cui’s blogpost Database migration with Flyway in large project)
- that does not have any business logic in triggers
- using an almost perfect kind of annotation parameters
If you search for migration tools, you will find some for migrating to DB2 or Oracle. That was not what I needed. In my opinion, you will not need such a tool if you have an application which meets the criteria above. If you obey the paradigm of layer architecture and thus having an enterprise application persistence layer, you can do it straightforwardly.
We were able to migrate our application without any architectural changes because we we kept the layer principles in mind from the beginning of the project. You may often do not like the development overhead of such a architecture and all the stuff that comes with it. Sometimes it is great to see that it is worth the effort you put in.