08 Sep 2011
The Missing Q
Yogi Berra famously once said, “When you come to a fork in the road, take it.” In this post, I’ll discuss a few different paths related to data quality during data migrations.
But let’s take a step back first. Extract-Transform-Load (ETL) represents a key process in any information management framework. At some point, at least some of an organization’s data will need to be taken from one system, data warehouse, or area, transformed or converted, and loaded into another data area.
The entire MIKE2.0 framework in large part hinges on data quality. DQ represents one of the methodology’s key offerings, if not its most important one. To this end, it’s hardly unique as an information management framework. But, as Gordon Hamilton and my friend Jim Harris pointed out recently on an episode of OCDQ Radio, not everyone is on the same page when it comes to when we should actually clean our data. Hamilton talks about EQTL (Extract-Quality-Transform-Load), a process in which data quality is improved in conjunction with an application’s business rules. (Note that there are those who believe that data should only be cleaned after it has been loaded into its ultimate destination–i.e., that ETL should give way to ELT.)
Why does this matter? Many reasons, but I’d argue that implicit in any decent information management framework is the notion of data quality. While many frameworks, models, and methodologies vary, it’s hard to imagine any one worth its salt that pooh-poohs accurate information. (For more on a different framework, see Robert Hillard’s recent post.)
Data Quality Questions
Different frameworks aside, consider the following questions:
- As data management professionals, should we be concerned about the quality of the data as we are transforming and loading it?
- Or can our concerns be suspended during this critical process?
- And the big question: Is DQ always important?
I would argue yes, although others may disagree with me. So, for the organization migrating data, when is the best time for conduct the cleansing process? Here are the usual suspects:
- Pre-Migration
- During Migration
- Post Migration
Pre-Migration
Organizations can often save an enormous amount of time and money if they cleansed their data before moving it from point A to point B. To be sure, different applications call for different business rules, fields, and values. Regardless, it’s a safe bet that a new system will require different information than its predecessor. Why not get it right before mucking around in the new application?
During Migration
Some contend that this is generally the ideal point to cleanse data. Retiring a legacy system typically means that organizations have decided to move on. (The term legacy system implies that it is no longer able to meet a business’ needs.) As such, why spend the time, money, and resources “fixing” the old data. In fact, I’ve seen many organizations specifically refuse to cleanse legacy data because their end users felt more secure with the knowledge that they could retrieve the old data if need be. So much for cutting the cord.
If your organization takes this approach, then it is left with two options: cleanse the data during or post-migration. If given the choice, I’d almost always opt for the former. It’s just easier to manipulate data in Excel, Access, ACL, or another tool than it is in a new system loaded with business rules.
Post-Migration
The ease of manipulating data outside of a system is the very reason that many organizations prefer to clean their in a system of record. Excel doesn’t provide a sufficient audit trail for many folks concerned about lost, compromised, or duplicated data. As such, it makes more sense to clean data in the new system. Many applications support this by allowing for text-based fields that let users indicate the source of the change. For instance, you might see a conversion program with a field populated with “data change – conversion.”
Is this more time-consuming? Yes, but it provides greater audit capaability.
Feedback
What say you?


September 8th, 2011 at 10:01 am
As DM professionals are primary concern is about the data. How it is constructed, used or reused, who owns the data and who’s accountable for ensuring the quality. During a data migration the focus is on ETL and answering have we moved the all relevant records into the new repository or database. The other question that should be answered too, is it fit for purpose? In the end the business will only sign off a migration project if they believe appropriate data is exists in the new system and they can manipulate the data according the business rules using the application functions. No big surprise.
For DM Professionals how do we get to this point? In DM projects it’s guaranteed that data quality activities will be performed whether it’s when the data is extracted from legacy systems or before the data is loaded into a staging area or even after the data is migrated. Understanding the scope and impact of when this occurs alters the amount and type of data quality that is performed. It’s easier to fit a data issue when you’re only dealing with one system, harder when it’s now mixed with data and business rules from other systems and complex when you expect the data to operate as per a new system.
September 8th, 2011 at 10:07 am
[...] The Missing Q [...]
September 8th, 2011 at 11:06 am
Q-E-Q-T-Q-LQ?