Archive for August, 2007
With a maturing in understanding by both business and government of the role of information in good governance has come the question “what’s next”. The answer, of course, is to turn a reactive culture which deals with data issues into a proactive one. In the MIKE2.0 community we talk a lot about Information Development which is an approach to think about data as part of the business function and application rather than as an afterthought when someone starts looking for reports.
Most users of this site would be familiar with the Maturity Model which describes the attributes of an organization that has evolved its capabilities to proactive and beyond. The obvious question then is, “how do I organize to optimize”? The answer is not, I believe, to adopt the organization model of an optimized organization, rather it is to evolve to the structure of the next level up. The Centre of Excellence articles make a variety of organization recommendations. Do the Information Management self-assessment and then pick the model that is a stretch but not too far from where you are now.
My observation is that there is a tight relationship between information maturity and the governance structure that is in place. Like the chicken and the egg though, some organisations mature and hence naturally evolve to the right model. Others take direct action and put the these roles and responsibilities in place which results in the corresponding level of maturity. Cause and effect don’t matter, but if you take the former path it is very helpful to have the “back of the book” answers handy.
We’re often asked to compare approaches to managing structured and unstructured data and attempts to bridge the gap between the two. Traditionally, technology practitioners who worried about unstructured data have been entirely different group to those that worried about structured data.
In fact, there are three types of data, structured, unstructured and a hybrid (records-oriented) grouping of semi-structured. They have much in common and are all part of the enterprise information landscape. In order to look at ways to leverage the relative strengths of the different types of data, it is important to first understand how they are used.
There are three primary applications of data within most enterprises.
The first is in support of operational processes. In the case of structured data, these processes are usually complex from a system perspective but often quite transactional from a human perspective. In the case of semi-structured and unstructured data, there is often less system intervention or interpretation of the data with a heavy reliance on human interpretation.
Secondly, each of the three is used for analysis. In the case of structured, it is easy to understand how the analysis is undertaken. With semi-structured/record data, analysis can be divided into aggregation of the structured components and a manual analysis of the free-text. With unstructured, analysis is usually restricted to searching for like terms and manually evaluating the documents.
Finally, all three types of data are used as a reference to back-up decisions and provide an audit trail for operational processes.
MIKE2.0 recommends approaches to governance, architecture and integration which are independent of the structure of the data itself.
The majority of effort associated with all data, regardless of its form, is gaining access to it at the time when it’s needed. In all three cases, there are processes to lookup or search the data. SQL for structured data, lookups for semi-structured and tree-oriented folders for unstructured. Increasing, the techniques for finding all three types are converging in one set of processes called Enterprise Search.
Ironically, despite the power of search, successful implementations are really mandating the implementation of common metadata and the use of a single enterprise metadata model. Again, MIKE2.0 takes the information architect through these requirements in a lot of detail.
In the future, organisations can expect to keep all three forms of data (structured, semi-structured records and unstructured documents) in the same repositories. However, there is no need to wait for this future utopia to begin leveraging all three in the same applications and managing them in a common way.
Why would an executive care? There are two main reasons why every business and technology executive should consider the quality of data modelling to be core to their success.
The first is that information is a valuable economic asset (as argued in MIKE2.0 in the article the Economic Value of Information). Customer data, performance data, analytical information all combine to be an asset that is often worth multiples of billions of dollars. If a company had billions of dollars worth of gold, I’d expect business executives to want to review and understand how such a valuable asset was housed! Given that the data model is usually the main home for the information asset, the same should also be true. The data model cannot be delegated to junior technical staff!
Increasingly there is another reason for elevating the data model. Legacy information is becoming an obstacle to business transformation. As the price of storage dropped during the 1990s, new systems began also storing ancillary data about the parties involved in each transaction and substantially more context for the event. Context could, for example, include the whole sales relationship tracking leading up to a transaction, or the staff contract changes that led to a salary change. With the context as part of the legal record, there are operational, regulatory and strategic reasons requiring that any new or transforming business function do nothing to corrupt the existing detail. The data model is the only tool we have to map new business requirements to old data.
Given the complexity of data modelling, it’s not surprising that executives have shied away from speaking to technologists about the detail of individual models. A discussion on normalization principles would be enough to put most decision makers off!
In the Small Worlds Data Transformation Measure article MIKE2.0 introduces a set of simple metrics to indicate whether on average the data models of an organization or doing a good job of managing the information asset. Using the principle that information makes most sense in the context of the enterprise, it measures the level of connectivity and the degree of separation on average across a subset or all of the data models housing the information assets.
One of the concepts we introduce in MIKE2.0 is that of Networked Information Governance – essentially Information Governance combined with Enterprise 2.0. What we found when it comes to governance is the importance of the “informal network” in solving problems – the emails, hallway discussions and phone calls that place on a daily basis or in a crisis.
When it comes to bringing the information network together with a formal approach, technologies and techniques from Enterprise 2.0 are a great fit: search, tagging and aggregation are the keys to bridging the gap.
Can this approach be extended beyond Information Governance? We believe it can. Governance techniques can generally benefit from this approach – from a corporate board discussion to managing compliance with environmental regulations. Does it mean we remove the verbal conversations? Of course not (although it might have them recorded, tagged and indexed). Is security important? Definitely. The proposed solution isn’t trivial, but the Networked Information Governance Solution Offering helps define the steps required.
I’ve been reading a book called Why Not? by Yale professors Barry Nalebuff and Ian Ayers which provides “four simple tools that can help you dream up ingenious ideas for changing how we work, shop, live, and govern.” – its a highly recommended read. I was fortunate enough to attend one of Barry’s lectures last week and he’s already given me a few new ideas.
One idea is that of a Devil’s Advocate for corporate governance. It explains the religious origins of the term and the benefits of having a person that takes a counter-point for the sake of argument. This person is a trusted adviser and has a duty to take this contrarian view, therefore their argument is not as one of dissent. In the book, explanations are provided on how this technique could be applied to Corporate Governance, where strong-arm techniques can easily over-run outside opinions.
The Devil’s Advocate an interesting role on the subject of Information Governance – possibly an architect assigned within the Information Development Organisation. Although this approach could be applied more generally to any solution, I think it makes particular sense for those related Information Governance decisions, as:
- Success requires concession and buy-in from multiple parties
- Solutions are complex so multiple viewpoints are important
- It is easy for one group to dominate, but as information flows horizontally across organizations the impact of issues can be asymmetric
- It is easy to get “stuck” when someone brings up a counterpoint due to emotion and frustration. Using a devil’s advocate alternative viewpoints are quickly put on the table (it is their duty to identify issues)
We’ve all seen strong-arm techniques or argumentative competitors ruin projects. When attempting to re-design an organisation to take a stronger focus on managing information, the shift is bound to run into issues.A trusted and educated view that raises arguments without being seen as a dissident would certainly be productive as a way to identify ownership roles, responsibilities and possible solutions.
I was speaking with a client this week who put forward the challenge that Information Management isn’t really as complicated as we in the profession make out. I stopped for a moment to think about how I could explain the intricacy of an entire body of practice and realised that I would need to pick just one example.
Given its prominence in the industry, I decided to use Master Data Management and particularly the process of matching between sets of master data.
I started with just two lists of people (set A and set B). I then explained how a typical algorithm would match individual records by creating a score and a threshold for matching. No problem my client said, he could use a spreadsheet for that!
I then added a third list (set C). Most algorithms compare two lists at a time. That means there are three combinations: AB followed by ABC, AC followed by ACB, and BC followed by BCA. To see why it matters, consider the following situation.
In set A, we have a record: “Robert Hillard, email robert.hillard[at]bearingpoint.com”
In set B, we have a record: “Robert Hillard, phone number +61 412 396 036”
In set C, we have a record: “Robert Hillard, phone number +61 412 396 036, email: robert.hillard[at]bearingpoint.com”
A typical business rule might require two items of data to match before the threshold is reached. That means we need name and email, name and phone number or email and phone number to define a match.
In the first scenario we match AB first followed matching the resulting records with set C. In this example, the two “Robert Hillard” records are not matched in the first pass meaning on the second pass when we bring in set C we can only end up with at best two records when we match the two entries to the new Robert Hillard in set C. The final result is two instances of Robert Hillard.
In the second scenario we match AC first which results in a full match on Robert Hillard, which in turn when set B is brought in matches to the instance in that file as well. The final result is just one instance of Robert Hillard.
Now understanding the complexity, my client tried to add a kludge solution by creating a master record for each match during an individual pass. There isn’t enough space in this posting to explain why this doesn’t help as the number of sets increases, however suffice it to say that each such band aid solution actually adds to the complexity when more sets are added.
In summary, the more sets there are to match the more combinations there are which will affect the outcome. For n sets there are, in fact (n-1)! (ie., n minus 1 factorial) combinations each of which will usually give a different final result for a statistically significant number of entries. Imagine the problem facing the US government when trying to bring together lists of doctors, lawyers or other professionals across 50 state lists!
Various motives can drive a company to undergo a large-scale “transformation” of its systems – upgrade technology, improve business processes, change an operating model. Whatever the goal, transformation efforts typically focus on the process and infrastructure changes required to improve system functionality and expand organizational capabilities. Often overlooked, however, is another key element of effective transformation – data. Without complete, accurate and useful data, transformation efforts aren’t likely to reach their goals. A data-driven approach to transformation provides the foundation to support process and infrastructure changes and meet ongoing data requirements.
In the MIKE2.0 Methodology we provided some ideas around how organizations can take a data-driven approach to mitigate risk and provide greater information-based functionality. Some good areas to go to find out more include:
The Exec Summary provides a presentation that summarizes the approach. We”ve used this approach effectively and would love to hear any feedback on similar experiences or the approach.
The authors of this blog have been pretty passionate for some years about Information Management and promoting the benefits of putting information and data at the center of an organizations development processes – Information Development.
To help promote this approach and to promote discussion and debate in the Information Management profession, we were behind an initiative to launch an open approach to Information Management – title MIKE2.0.
During the 1990′s the volume of raw data held by enterprises has grown exponentially. All of that data had to be put to some use, and it has been both internally and externally. As a result, non-ledger data has taken on greater and greater importance in the management, oversight and assessment of companies. Unfortunately, the use of agreed processes and standards for the aggregation, measurement, quality and interpretation of the data has not moved at the same rate with every enterprise free use their own approaches. In some cases this results in innocent ambiguity while in other cases organizations have taken the opportunity to deliberately mislead their stakeholders.
The complexity of data is not generally well understood. Most often, it is assumed to be a set of static datasets which can be related to each other in an unambiguous way. The reality is that data is constantly changing across the enterprise 24 hours a day. With financial reporting, this constant change is generally well managed with ledger aggregation, group reporting and, most importantly, period-end closing. By agreeing to specific cut-offs a point of reconciliation stabilizes all of this ongoing change. Although it is taken for granted, the process followed to stabilise the data are non-trivial.
If non-ledger data is to be trusted to the same extent as financial data, then its complexity needs to be equally well managed in ways which are consistent across the industry. No one consulting firm and no one financial institution can find the “right” answer unless the approach is much more widely adopted. For this reason we have not only invested heavily in developing approaches to managing and measuring complex data, but have convinced our employer – BearingPoint – to donate it to the wider profession using a Creative Commons licensing model.
MIKE2.0 is that initiative and is larger than any one group of professionals. It is managed by a mix of industry professionals across end-user and consulting firms. It is designed as a multi-lingual collaboration that can link external reporting minimum standards with multiple internal data consolidation processes using a variety of technologies. MIKE2.0 is one of the initiatives that Information Management professionals looking to shape their industry can embrace, influence and extend.
I am continually struck by the lack of formal valuation models to information. Considerng how much organizations spend on building and maintaining information assets and how valuable they are to the health of the business, you would think it would be an area that would receive more focus.
While I’ve seen a number of academic papers on assessing the Economic Value of Information, the practically implemented cases are few and far between. I have done some development on “Assessment-oriented” models that can be value in formulating a strategy, such as the Economic Value of Information model in MIKE2.0.
An Information Value Assessment should provide a mechanism to assign an economic value to the information assets an organization holds and the resulting impacts of Information Governance practices on this value. It could also measure whether the return outweighs the cost and the time required to attain this return.
Governance models are just one way of assessing value. Other simple techniques could include:
- Mastering – how many systems hold this common data?
- Latency – if I load this data into a warehouse in an hourly fashion as opposed to weekly what are the gains?
- Quality – RI issues, accuracy issues
- Reach – how many people read my blog? who are the readers?
I think this is an area where industry models will greatly improve, similar to what has occurred in the past 10 years in the infrastructure space. The lack of model points to the immaturity of information management as a competency and the strict building of information with technology. I would welcome any other opinions on technqiues.
TODAY: Sat, April 29, 2017August2007