How Database Migrations Go Wrong: From a Head of Migration Department

Are you just one step away from launching your database migration project? If so, this content is for you. Without long introductions, we've decided to give you the final piece of advice before you start, wrapped in this short interview with Ispirer's Head of Database Migration Department, Alex.

Let's get you 100% prepared for a successful move!

Hi Alex, can you quickly elaborate on the worst-case scenarios, beyond typical budget and schedule overruns, when it comes to database migration?

Alex: Sure, it's best to know that the most severe failures are rarely about a single isolated event. They are interconnected issues, which people stumble upon because they just underestimated the project's complexity.

Imagine a very large migration for a financial services firm from a legacy Oracle database to a cloud-native architecture. The project plan can be comprehensive and promise many benefits of the cloud. Yet it can fail at a simple thing: overlooked decades of undocumented business logic embedded within database triggers and stored procedures.

In simple words, once the engineers dive into the process, they find the objects they cannot understand fast and need more time decoding and transferring them. If this is a large system with many undocumented objects, the costs of a migration can skyrocket.

How does this type of 'unknown unknown' manifest in other technical forms?

Alex: Well, another frequent point of failure is when data is migrated structurally, which is a good thing, but its contextual meaning is not taken into account. For instance, getting a bit techie, mapping a numeric customer_id to a text-based UUID can cause downstream billing and analytics platforms to fail. For non-tech people, think of it like changing a customer's house number from "123" to a long random string of letters and numbers.

Such a simple thing can create a severe case for many organizations, in the healthcare sector, especially. It's easy to silently cut patient medical codes because the new system's field has a shorter character limit. This won't be flagged as an error. But what will happen next, when you start working with a new system and need to access a patient's history for a critical procedure?

I will tell you: the technical issue will become a significant patient safety and liability concern. Because, by cutting the codes, engineers erased crucial details from a patient's health history. Thus, doctors will lack the complete and accurate information to make informed decisions, potentially making life-threatening errors during critical procedures.

All these little things, and there are just too many of them, make migrations fail or succeed.

Try SQLWays license for free!

Try Now

As I can see, the human factor and institutional knowledge appear to be significant variables in these scenarios. How do you quantify and mitigate that risk?

Alex: Indeed, and they are the most unpredictable variables. An even more dangerous knowledge debt often goes together with technical debt. I've seen many sole subject matter experts on architecture and performance of a company's database. They are often an engineer nearing retirement, and it's typical that their employer allocates a mere two weeks for "knowledge transfer."

This is inadequate. Since the expert's understanding is based on decades of operational experience with this particular environment, it's impossible to transfer the knowledge in such a short time. And this is absolutely not a thing to do right before you migrate.

What is usually the result of such a fast "knowledge transfer"?

Alex: Just one of the scenarios is that the new engineering team, skilled in modern database platforms like PostgreSQL, can replicate the schema but fail to replicate the performance-critical logic and data access patterns. Again, a very tech thing, but the effect of this mistake is very visible to the business.

A possible consequence is that something that worked in an old system in milliseconds will time out on the new, supposedly more powerful hardware. The performance can become so degraded that the app will be unusable. You will likely be forced to retain the retired engineer as an external consultant at a significant cost to re-architect things.

The mitigation for this is strict documentation, a culture that does not permit single points of failure for critical system knowledge, and a migration strategy that involves performance and load testing that realistically simulates production workloads before the final cutover.

Thanks! Now that we know these potential points of catastrophic failure, what is the correct strategic approach to de-risking a complex migration?

Alex: You need to treat the migration as a complex business-critical initiative and operate with a high degree of professional paranoia, which means: Trust nothing, validate everything. So, begin with automated data profiling and discovery long before the first migration script is written. You must understand the data's lineage, its dependencies, and its quality issues at a granular level. By "you," I mean your dedicated database migration team.

Also, opt for an iterative phased migration, which means moving data in logical chunks by service, by region, by function, whenever you can. Each phase must be accompanied by parallel runs, where the legacy and new systems are run concurrently to compare outputs and ensure data parity.

And, most critically, the rollback plan must be as well-tested as the migration plan itself. The worst possible outcome is a failed migration that results in the pervasive corruption of an organization's most critical asset: its data. This leads to a complete loss of trust in the data's integrity, which can cripple decision-making and business operations indefinitely.

When Database Migrations Go Wrong: Quick Lessons from a Head of Database Migration Department

Do you have a migration project ahead?