The Missing Q

Yogi Berra famously once said, “When you come to a fork in the road, take it.” In this post, I’ll discuss a few different paths related to data quality during data migrations.

But let’s take a step back first. Extract-Transform-Load (ETL) represents a key process in any information management framework. At some point, at least some of an organization’s data will need to be taken from one system, data warehouse, or area, transformed or converted, and loaded into another data area.

The entire MIKE2.0 framework  in large part hinges on data quality. DQ represents one of the methodology’s key offerings, if not its most important one. To this end, it’s hardly unique as an information management framework. But, as Gordon Hamilton and my friend Jim Harris pointed out recently on an episode of OCDQ Radio, not everyone is on the same page when it comes to when we should actually clean our data. Hamilton talks about EQTL (Extract-Quality-Transform-Load), a process in which data quality is improved in conjunction with an application’s business rules. (Note that there are those who believe that data should only be cleaned after it has been loaded into its ultimate destination–i.e., that ETL should give way to ELT.)

Why does this matter? Many reasons, but I’d argue that implicit in any decent information management framework is the notion of data quality. While many frameworks, models, and methodologies vary, it’s hard to imagine any one worth its salt that pooh-poohs accurate information. (For more on a different framework, see Robert Hillard’s recent post.)

Data Quality Questions

Different frameworks aside, consider the following questions:

  • As data management professionals, should we be concerned about the quality of the data as we are transforming and loading it?
  • Or can our concerns be suspended during this critical process?
  • And the big question: Is DQ always important?

I would argue yes, although others may disagree with me. So, for the organization migrating data, when is the best time for conduct the cleansing process? Here are the usual suspects:

  • Pre-Migration
  • During Migration
  • Post Migration

Pre-Migration

Organizations can often save an enormous amount of time and money if they cleansed their data before moving it from point A to point B.  To be sure, different applications call for different business rules, fields, and values. Regardless, it’s a safe bet that a new system will require different information than its predecessor. Why not get it right before mucking around in the new application?

During Migration

Some contend that this is generally the ideal point to cleanse data. Retiring a legacy system typically means that organizations have decided to move on. (The term legacy system implies that it is  no longer able to meet a business’ needs.) As such, why spend the time, money, and resources “fixing” the old data. In fact, I’ve seen many organizations specifically refuse to cleanse legacy data because their end users felt more secure with the knowledge that they could retrieve the old data if need be. So much for cutting the cord.

If your organization takes this approach, then it is left with two options: cleanse the data during or post-migration. If given the choice, I’d almost always opt for the former. It’s just easier to manipulate data in Excel, Access, ACL, or another tool than it is in a new system loaded with business rules.

Post-Migration

The ease of manipulating data outside of a system is the very reason that many organizations prefer to clean their in a system of record. Excel doesn’t provide a sufficient audit trail for many folks concerned about lost, compromised, or duplicated data. As such, it makes more sense to clean data in the new system. Many applications support this by allowing for text-based fields that let users indicate the source of the change. For instance, you might see a conversion program with a field populated with “data change – conversion.”

Is this more time-consuming? Yes, but it provides greater audit capaability.

Feedback

What say you?

 

 

 

Category: Information Management, Information Strategy
3 Comments »