I was speaking with a client this week who put forward the challenge that Information Management isn’t really as complicated as we in the profession make out. I stopped for a moment to think about how I could explain the intricacy of an entire body of practice and realised that I would need to pick just one example.
Given its prominence in the industry, I decided to use Master Data Management and particularly the process of matching between sets of master data.
I started with just two lists of people (set A and set B). I then explained how a typical algorithm would match individual records by creating a score and a threshold for matching. No problem my client said, he could use a spreadsheet for that!
I then added a third list (set C). Most algorithms compare two lists at a time. That means there are three combinations: AB followed by ABC, AC followed by ACB, and BC followed by BCA. To see why it matters, consider the following situation.
In set A, we have a record: “Robert Hillard, email robert.hillard[at]bearingpoint.com”
In set B, we have a record: “Robert Hillard, phone number +61 412 396 036”
In set C, we have a record: “Robert Hillard, phone number +61 412 396 036, email: robert.hillard[at]bearingpoint.com”
A typical business rule might require two items of data to match before the threshold is reached. That means we need name and email, name and phone number or email and phone number to define a match.
In the first scenario we match AB first followed matching the resulting records with set C. In this example, the two “Robert Hillard” records are not matched in the first pass meaning on the second pass when we bring in set C we can only end up with at best two records when we match the two entries to the new Robert Hillard in set C. The final result is two instances of Robert Hillard.
In the second scenario we match AC first which results in a full match on Robert Hillard, which in turn when set B is brought in matches to the instance in that file as well. The final result is just one instance of Robert Hillard.
Now understanding the complexity, my client tried to add a kludge solution by creating a master record for each match during an individual pass. There isn’t enough space in this posting to explain why this doesn’t help as the number of sets increases, however suffice it to say that each such band aid solution actually adds to the complexity when more sets are added.
In summary, the more sets there are to match the more combinations there are which will affect the outcome. For n sets there are, in fact (n-1)! (ie., n minus 1 factorial) combinations each of which will usually give a different final result for a statistically significant number of entries. Imagine the problem facing the US government when trying to bring together lists of doctors, lawyers or other professionals across 50 state lists!



August 10th, 2007 at 7:25 pm
Rob, while I understand you’re describing why information management is complex, I think it’s also worth mentioning the benefit an organization gains from successfully consolidating meta data about a single entity (particularly a person). And in simple terms, the benefit is “Discovery”.
That is, if we can harvest all that there is to know about a person from various federated data sources and make all that there is to know about that person discoverable, we are laying the groundwork for social discovery on a grand, enriched scale.
While data is very important, I’d argue people are the organization’s greatest asset. And I’m excited by how metadata can facilitate social discovery providing disparate human resources with the ability to locate each other not just by name but by skill, experience, interest, contribution etc.
August 11th, 2007 at 6:39 pm
[...] got me thinking about a post my boss wrote about information management where he discusses a Master Data Management scenario where the [...]
August 13th, 2007 at 12:12 pm
Jeremy,
You make a very good point. Metadata creates the opportunity to link information together across the enterprise and makes it possible to then both find people and (even more importantly) the information that makes them valuable (the reason you want to work with them). This is another way of describing the organization as a network and is the reason why we talk about Networked Information Governance (http://mike2.openmethodology.org/index.php/Networked_Information_Governance_Solution_Offering). If the metadata helps us to find the value, IM techniques help us to manage the hidden complexity.