Open Framework, Information Management Strategy & Collaborative Governance | Data & Social Methodology - MIKE2.0 Methodology
Wiki Home
Collapse Expand Close

Collapse Expand Close

To join, please contact us.

Improve MIKE 2.0
Collapse Expand Close
Need somewhere to start? How about the most wanted pages; or the pages we know need more work; or even the stub that somebody else has started, but hasn't been able to finish. Or create a ticket for any issues you have found.

Data Quality Improvement Solution Offering

From MIKE2.0 Methodology

Jump to: navigation, search
Hv4.jpg This Solution Offering is provided through the MIKE2.0 Methodology. It receives full coverage within the Overall Implementation Guide and SAFE Architecture and contains a number of Supporting Assets. It has been defined inline with the appropriate Solution Offering Creation Guide and has been peer reviewed. It may have some minor issues with Activities or lack depth. In summary, the Solution Offering can be used but it still may have some issues.
A Creation Guide exists that can be used to help complete this article. Contributors should reference this guide to help complete the article.



The Data Quality Improvement Solution Offering provides an approach for addressing Data Quality issues across a number of dimensions, such as data accuracy, integrity and completeness. The focus of this Solution Offering is around quantitatively understanding data quality issues and addressing these issues through a systematic re-engineering process. Attaining Data Quality Improvement involves preventing these issues from occurring in the first place and this offering works in conjunction with the Information Governance Solution Offering to provide a more comprehensive approach to solving this complex problem.

The Executive Overview on Data Quality Improvement provides an introductory presentation on this Solution Offering.

Executive Summary

The most progressive businesses regard information as much of an enterprise asset as financial and material assets. Having the right approach to information management will enable optimal strategic and operational decisions, for which having accurate, timely and complete information is essential. It will also facilitate flexibility for the future: as application packages come and go and integration technologies change, the data largely stays the same. Therefore, an organisation’s approach to information management is critical to its success. In terms of defining a successful strategy in this area, consider some of the statistics:

  • In 2001, Data Quality issues accounted for nearly $600M in losses for US companies
  • Up to 88% of data-related projects fail, largely due to issues with Data Quality
  • According to the Standish Group, in 1998, 74 percent of all data migration projects either overran or failed, resulting in almost $100 billion in unexpected costs.
  • In a survey of 300 IT executives conducted by Information Week, the majority of the respondents (81 percent) said, “improving (data) quality was their most important post-year 2000 technology priority”
  • Data Quality issues lead to 87% of projects requiring extra time to reconcile data - TDWI Data Quality Survey, December, 2001
  • Data Quality issues lead to lost credibility within a system on 81% - TDWI Data Quality Survey, December, 2001
  • Day-to-day operating costs due to bad data are estimated to be as high as 20% of operating profit.

Data Quality issues are Hidden and Persistent

Data quality issues can exist unnoticed for some time, although some users may suspect the data in the systems they rely on to make their decisions is not accurate, complete, current, valid, or consistent. Data quality issues tend to leave business users, knowledge workers, and IT executives with a common set of questions in the back of their mind:

  • Does faulty data put me at risk of violating regulatory requirements?
  • Why does each department use the same exact data elements in different ways?
  • How much effort is duplicated on each project by not having common data infrastructure and standards?
  • By the time the data is available for analysis, is it too late to be useful?
  • What technologies are available to help with data quality? Where do I start?
  • Once we start, how do we sustain data quality over time?
  • Can I trust the information that is used to measure the performance of our business?
  • Can I base strategic decisions on this data?
  • Are data quality issues impacting customer satisfaction?
  • Every month I get at least two different answers to the same question. Whose answer is right?
  • Which Lines of Business / Subject Areas / Systems will benefit most from improved data quality?
  • How much manual effort is wasted each month performing reconciliation tasks?

Data Quality is Fit for Purpose

It is difficult for users of downstream systems to improve the data quality of their system because the data they derive information from is entered via customer facing operational systems. These customer facing system operators do not have the same incentive to maintain high data quality and they are focused on entering the data quickly and without rejection by the system at the point of entry.

Data quality issues become more visible when data is transferred from one system, a source system, and loaded into another system, a target system, in particular, when the data is integrated. Common source systems include operational, CRM, and ERP systems. The most common target system is a data warehouse used for MIS, DSS, or data mining reporting and analysis.

The difference in purpose of source system data, and target system data is significant. Operational source systems only need to store transactional level data relating to a single system. The purpose of data warehouse data is to provide a complete domain description of an organization, or organizational unit for an extended period. This difference in data purpose promotes different required levels and measures of data quality for source, and target systems.

Source system data quality issues within a single system do not normally prevent the system from operating. For a data warehouse though, data quality issues can reduce the ability to make decisions based on data warehouse data. Data quality of data warehouse systems is critical to their acceptance and use by knowledge workers. Poor data warehouse quality is often cited as the most common cause for data warehouse project failure. Data Quality does not necessarily imply 100% error free data. The requirement is that it meets the needs of the people who are using it. In addition to basic data field errors, other factors contribute to data quality as well:

  • Different business units use common data elements but apply different definitions and business rules (meta-data). For example, global businesses will use currency conversation rate rules that best suits their financial reporting.
  • Data is normally captured in systems with a view to support a particular business process or function, not to provide valuable information to the organization.
  • Business processes change much faster than systems, resulting in data that is incorrectly or incompletely recorded.
  • Master data is often only correct at the time of recording. For example, 17% of Americans move each year, and over 2.5 million business change addresses in the same timeframe.

A Data Investigation programme is a key aspect to getting started

We believe that this approach is required to accurately cost and schedule any data transition or consolidation project as it minimises the risks often caused by data quality issues that result in fire-flights in the 11TH hour. This is the most expensive and worst time to discover problems with the data, as more often than not it results in project delays as it is late in the process and there is little time to do the analysis work that is required to fix the problem.

Data Quality issues are the major impediment to success in a variety of project implementations. Despite this, many organisations get started off on the wrong foot in their information management programmes, by designing solutions without understanding the issues that they face. Identify issues through Data Profiling allows organisations to deal with data quality issues early in the programme, thereby addressing quality problems from the start. Data Profiling also provides the basis for making quantitative decisions around the solution implementation, and provides analysis work that is independent of any technology decisions.

Data quality is contextual to an individual company. Generic approaches of profiling should not be done without understanding the business expectations of the data. Issues surfaced during profiling should not be addressed without this understanding either.

Governance is needed to Prevent Quality Issues

Dispute the tremendous cost of issues, most organisations are struggling to addresses their Data Quality issue as they are stuck in a state where they are also fixing issues, but not preventing them. A Data Governance programme is used to stop Data Quality issues from occurring in the first place. It includes standards, policies, processes, delivery best practices, organisational efficiency and agility.

By addressing each of these issues, organisation can get on the right track towards Data Quality Improvement. This Solution Offering can be used in conjunction with the Data Investigation and Re-Engineering Solution Offering and Information Governance Solution Offering to provide the comprehensive approach that is required to address complex historical issues in a systematic fashion and to prevent issues from occurring in the first place.

Solution Offering Purpose

This is a Core Solution Offering. Core Solution Offerings bring together all assets in MIKE2.0 relevant to solving a specific business and technology problem. Many of these assets may already exist and as the suite is built out over time, assets can be progressively added to an Offering.

A Core Solution Offering contains all the elements required to define and deliver a go-to-market offering. It can use a combination of open, shared and private assets.

Solution Offering Relationship Overview

The MIKE2.0 Data Quality Improvement Solution Offering is part of the EDM Solution Group

MIKE2.0 Solution Offerings provide a detailed and holistic way of addressing specific problems. MIKE2.0 Solution Offerings can be mapped directly to the Phases and Activities of the MIKE2.0 Overall Implementation Guide, providing additional content to help understand the overall approach. The MIKE2.0 Overall Implementation Guide explains the relationships between the Phases, Activities and Tasks of the overall methodology as well as how the Supporting Assets tie to the overall methodology and MIKE2.0 Solutions. Users of the MIKE2.0 Methodology should always start with the Overall Implementation Guide and the MIKE2.0 Usage Model as a starting point for projects.

Solution Offering Definition

Putting a Data Quality Management programme in place typically requires a comprehensive set of changes to people, process, organisational structure and technology. The goal with data quality management is to avoid data quality issues from occurring in the first place by building an organisation that is focused on the concepts of Information Development. Delivering an overall programme for data quality management can be seen to involve 3 steps:

Mike2 Step1-DQI.jpg
  • Quickly assess the current state
  • Subject areas are selected and data stewards are agreed.
  • Organise the DQM project and define the standards, policies and an overall architectural approach
Mike2 Step2-DQI.jpg
  • Implement a Data Quality solution that will assess the data against the identified rules that were captured in Step 1.
  • Business processes and rules are well documented, KDEs are prioritized, DQ metadata repository is populated ready for profiling
  • Define the overall architecture in relation to data mastering and data synchronisation
  • Organise the DQM project and define the standards, definitions and business rules for the Key Data Elements (KDEs).
  • Define measurement and tracking capabilities of the solution
  • Aim to enforce the use of standards and support ongoing data improvement and management processes
  • Define the data items that are candidates for archiving that no longer serve a useful purpose.
Mike2 Step3-DQI.jpg
  • Builds on the improved quality of data provided through Step 2.
  • Establish monitoring activities that review data for completeness and accuracy.
  • As part of the data remediation process, integrate and optimise the processes and techniques defined in Step 2 into the overall data management framework.
  • Progressively automating processes that do not required user interaction
  • Communication of Data Quality results is enhanced through timely reporting of Data Quality metrics against defined benchmarks.

Relationship to Solution Capabilities

This Solution Offering maps into the Solution Capabilities of MIKE2.0 as described below.

Relationship to Enterprise Views

The MIKE2.0 Solution for Data Quality Improvement covers all areas of Information Development, across people, process, organisation, technology and strategy. Data Quality improvement typically involves the implementation of systematic processes and methods, staff skills development, organisational changes and new technologies. In order to improve data quality across an organisation, enforcing Information Development concepts is crucial.

Mapping to the Information Governance Framework

The Information Governance Solution Offering is required across all Solution Offerings. For Data Quality Improvement, this is particularly important in that governance standards and policies drive requirements for data retention and protection. Changes to these policies can lead to major re-work; policy or process issues can lead to major risks for the business.

MIKE2.0 provides a comprehensive approach for Information Governance that is defined in the Information Governance Solution Offering and refers to this overall approach as "Information Development". We believe that organizations have traditionally not given enough focus to this area and hence face many of the problems that they do today. MIKE2.0 provides an approach to implement a Data Governance programme that is very comprehensive in its scope and is aligned to addressing a number of other business problems which at their core are data management problems.

Mike2 DQ org.jpg

Experience has shown that the more strategic the data quality initiative the greater the need for a well-defined governance framework. A Data Governance Council is critical for the successful implementation of a Data Quality Improvement programme; an example of how a governance structure should look and the responsibilities it covers can be seen in the figure above.

Measuring Data Quality

Measuring data quality is a key success factor for the long term sustainability of your DQI initiative. Funding for people, process and technology investments must be driven by real Return on Investment. Typically, organisations do not only measure along quantitative dimensions, but also include softer/intangible dimensions to justify their investment and measure success. Defining data quality KPIs requires an interplay of organisational support, governance and accountability, processes, policies and standards, as well as an overall support (either automatically or by analysis from members of the data governance team) by a set of tools. The following diagram outlines an approach to measuring data quality:

Measuring data quality v2.png

Also see Data Governance Metrics.

Mapping to the SAFE Architecture Framework

The architecture framework for Enterprise Information Management is known as SAFE - Strategic Architecture for the Federated Enterprise. The Conceptual Architecture describes the functionality to be delivered by the technology components of the solution to deliver the full set of capabilities required for Enterprise Information Management. As can be seen below, SAFE applies to the Technology steam of the Complete Enterprise View.

Several Foundation Capabilities from the SAFE Conceptual Architecture are required for Data Investigation and Re-Engineering. The development of metadata during the Data Investigation process oftentimes flows through to the standardization, correction, matching and enrichment of data. It is used to determine what data should be extracted from the source environment and loaded into the target staging environment.

As with Data Investigation, the Data Re-Engineering process produces metadata assets that should be stored in a metadata repository that can be shared by users and technologies in the data integration environment. Data Re-Engineering processes may be used with the ETL layer in particular, and becomes part of the overall Services-Oriented Architecture.

We use the output Data Re-Engineering as valuable input for creating a business plan for fixing ongoing data quality issues. This goes into our overall approach around creating the Information Development Environment.

Foundation Capabilities for Information Development are the basic capabilities required to model, investigate and resolve data issues. They cover the modeling of data and metadata as well as the capabilities required for resolving data quality issues. Many of the Information Development capabilities are dependent upon one another and are performed in a process-driven approach for data re-engineering. They should typically be performed in the early stages of project implementations and be used throughout the project lifecycle (including post-deployment).

  • Data Investigation typically forms the first step in building capabilities for Information Development. In the MIKE2.0 methodology, it involves both an interviewing-based assessment and a quantitative assessment of an organisations’ data. It also typically involves an ongoing monitoring process that is put in place once the solution has been implemented. Interviews are complemented by a quantitative assessment called Data Profiling.
  • Data Profiling typically uses a specialised vendor tool that enables the initial establishment of standards and initial formulation of metadata. It works by parsing and analysing free-form and single domain fields and tables, to determine issues with data.
  • Data Standardisation refers to the conditioning of input data to ensure that the data has the same type of content and format. Standardised data is important for effectively matching data, and facilitating a consistent format for output data.
  • Data Matching is a key capability for Information Development. The ability to provide probabilistic matching to any relevant attribute – evaluating user-defined full fields, parts of fields, or individual characters is critical.
  • Data Modelling is a set of techniques used to move from a very high level expression of information requirements to the detailed physical implementation of data structures.
  • Metadata is "data about data", providing further descriptors to turn data into information. As the most basic level, it means siloed metadata repositories that provide either additional information in the case of data dictionaries or may provide an abstraction layer in a user-driven reporting environment.

Mapping to the Overall Implementation Guide

The MIKE2.0 approach for improving Data Quality Improvement goes across all 5 phases of the methodology. The most critical activities for Data Quality Improvement are shown below. These include:

  • The Interviewing Based Assessment takes place during Phase 1, the Business Assessment and Strategy Blueprint.
  • Data Profiling is a Foundation Activity that takes place during Phase 3, the Roadmap.
  • In the event that Data Profiling is put into operation in an ongoing Data Monitoring capability, it will take place in Phase 5 of the MIKE2 Methodology, after the system has been deployed.
  • Data Re-Engineering is a Foundation Activity that takes place during Phase 3 of the MIKE2 Methodology, the Roadmap. The Data Re-Engineering process may be put into operation and used during the ETL integration process, which takes place during Phase 4 of MIKE2.
  • Data Governance activities that take place across all 5 phases of the methodology.

Other Activities are also relevant, but these are particularly focused on Data Governance, Data Investigation and Data Re-engineering efforts.

Business Assessment and Strategy Definition Blueprint (Phase 1)

Within Phase 1 of MIKE2.0, time will be spent on defining the overall business strategy and definition of a strategic Conceptual Architecture to define a relatively high-level vision for developing the envisaged future-state. The interviewing based assessment that uses Information Maturity QuickScan is a key part of this process for building out the vision state and the gap from the current-state environment. The requirements for a profiling tool will be begin to be formulated in this phase, based on the need to make quantitative decisions around the quality of information.

Organisational QuickScan

The Organisational QuickScan for Information Development is about trying to quickly understand of the organisation’s current environment for Data Governance and to begin to establish the vision for where it would like to go throughout the programme. This means that some of the key tasks within this Activity involve capturing the current-state set of practices around Data Governance, which are often poorly documented. As MIKE2.0 uses a broad definition of Data Governance, this assessment process involves People, Process, Organisation and Technology. QuickScan assessments are a core part of this activity as they not only provide a rich starter set of questions but also provide maturity guidelines for organisations. The gap between the current-state assessment and the envisioned future-state gives as early indicator of the scope of the overall Data Governance programme.

Data Governance Sponsorship and Scope

In order to conduct a successful Data Governance programme, it is important to have sponsorship at senior levels. Data Governance Sponsorship and Scope is focused on defining what this initial scope will be for improved Data Governance, based on the high-level information requirements and the results of the organisational assessment. This leadership team will play an ongoing role on the project.

Initial Data Governance Organisation

The Initial Data Governance Organisation is focused on establishing the larger Data Governance Organisation. Roles and Responsibilities are established and the overall Organisational structure is formalised. Communications models for Data Governance are also established, which become a critical aspect of issue resolution and prevention further down in the implementation process. The Data Governance Organisation that is established at this point will become more sophisticated over time. The continuous implementation phases of MIKE2.0 (phases 3,4,5) revisit organisational structure for each increment and there are specific improvement activities around moving to an Information Development Organisational model in Phase 5.

Technology Assessment and Selection Blueprint (Phase 2)

During Phase 2 of the Data Quality Improvement process, the technology requirements are established at the level to determine whether a vendor product will be used to fulfill the data investigation process. As part of vendor selection, a detailed process occurs regarding definition of functional and non-functional requirements in order to select vendor products. The overall SDLC strategy (standards, testing and development environments) that will support data profiling are also put in place during this phase. This is explained in the Overall Implementation Guide.

Data Policies

Data Governance Policies are derived from the Policies and Guidelines developed in Phase 1. These high-level policies impact the definition of Data Standards, in particular data security, normalisation and auditing practices.

Data Standards

Data Standards are an important part of Data Governance as standards take complexity out of the implementation process though common language, term definitions and usage guidelines. The standards should be established before the implementation teams begin any detailed work. This will make sure that the team is using a common set of techniques and convention and working within the overall policy framework for Data Governance. As part of an overall Data Governance programme, standards are typically developed for:

  • Data Specification
  • Data Modelling
  • Data Capture
  • Data Security
  • Data Reporting
  • Data Standards should be straightforward and follow a common set of best practices. Oftentimes, Data Standards will already exist that can be leveraged.

Roadmap and Foundation Activities (Phase 3)

The Foundation Activities of MIKE2.0 are arguably the most important aspects of the overall methodology for improving Data Quality. The focus in implementing the Foundation Activities is around those Key Data Elements that are deemed the most crucial to the business.

Information Management Roadmap Overview

The Information Management Roadmap Overview includes the preparation and detailed planning that takes place during this scoping phase by examining the documents which have been prepared as inputs to the design process and ensures that they contain sufficient and accurate information. Data Investigation is typically part of a larger overall project, but in some cases may be a standalone engagement. Also within this phase, the project plan for the specific increment is created that contains a detailed list of tasks to be accomplished during Data Profiling, estimates for those tasks and dependencies among the tasks. The steps for planning are described in the Overall Implementation Guide.

Software Development Readiness

Software Development Readiness in involves establishing the technology environment for Data Quality Improvement. This environment is established across Phases 2 and 3 of MIKE2.0 (specific revisions for the increment are made during Phase 3). The Technical Architecture will have specified hardware and software for Development, Testing, Production and these must be examined, corrected and/or enhanced as required. Development standards, software migration procedures are security measures for the SDLC are also defined.

Business Scope for Improved Data Governance

Definition of the Key Data Elements as part of the Business Scope for Improved Data Governance is a key part of the MIKE2.0 approach to Data Governance. KDEs help focus the work to be done to the most critical data that impacts business users. Data valuation then assigns value to KDEs that are used to prioritize the scope of the Data Governance program. The Data Governance approach focuses primarily on these KDEs for each increment.

Enterprise Information Architecture

Most organisations do not have a well-defined Enterprise Information Architecture. MIKE2.0 takes the approach of building out the Enterprise Information Architecture over time for each new increment that is implemented as part of the overall programme. The scope for building the Enterprise Information Architecture is defined by the in-scope Key Data Elements (KDEs). The Enterprise Information Architecture includes the model to support these KDEs, what systems they resides in, the mastering rules for this data and how often it is to be mastered.

Root Cause Analysis of Data Governance Issues

Preventing Data Governance issues involves analyzing those process activities or application automation that prevents Data Governance issues from occurring in the first place. Root Cause Analysis of Data Governance Issues is concerned with correcting root cause issues as opposed to addressing the symptoms.

Data Governance Metrics

Data Governance Metrics are focused on defined areas to be measured for the KDEs, to assess current performance levels and set targets for improvement. Each KDE is measured against the defined metric category through the appropriate measurement technique.

Data Profiling

Data Profiling typically involves conducting column, table and multi-table profiling. This document presents the detail of this process; the Overall Implementation Guide presents the overall set of tasks and how they relate to an overall information management programme. Some aspects of profiling may be done in a fairly automated fashion with a tool, but data investigation will also typically involve manual testing of specific rules.

Data Re-Engineering

Data Re-Engineering helps improve Data Governance by dealing with historical Data Quality issues that are typically identified in Data Profiling. MIKE2.0 recommends that Data Re-Engineering follow a serial process of standardisation, correction, matching and enrichment but that this process by conducted iteratively, following the "80/20 rule". This provides a model improving Data Governance is the most cost-effective and expedient fashion.

Develop, Test, Deploy and Improve Activities (Phase 5)

The latter Activities of Phase 5 are focused on Continuous Improvement of the overall Data Governance processes, technology environment and operating model.

Continuous Improvement – Compliance Auditing

Continuous Improvement - Compliance Auditing are conducted by an external group as opposed to the internal Data Governance team. Audits don’t involve the technical aspects of data analysis (i.e. data profiling), but instead involves inspection of results and looking at overall processes for Data Governance.

Continuous Improvement – Standards, Policies and Processes

Continuous Improvement - Standards, Policies and Processes revisits the overall set of standards, metrics, policies and processes for Data Governance. Recommended changes feed into the next increment of work as part of the continuous implementation approach of the MIKE2.0 Methodology.

Continuous Improvement – Data Quality

Continuous Improvement - Data Quality involves identification of root causes and ongoing Data Quality monitoring. This allows a far more proactive approach to Data Governance, whereby organization can either address issues quickly or stop them from occurring altogether.

Other Activities

The Activities listed above are typically associated with a standalone Data Investigation project, which generally completes as one of the Phase 3 Foundation Activities of MIKE2. Other aspects of MIKE2 may include:

  • Test Planning/Design of the environment or for specific investigation rules (especially those that will be put into operation for monitoring)
  • Putting data profiling processes for ongoing monitoring into Operation
  • Development of product documentation and administration.

The process for developing the administration and training guides for the Data Investigation environment is described in the Overall Implementation Guide.

Mapping to Supporting Assets

Improving Data Quality should go across people, process, organisation and technology. In addition to following the relevant Activities from the Overall Implementation Guide, the following artifacts from MIKE2.0 can be used to assist in this effort:

Tools and Technique Papers

Data Investigation and Re-Engineering Design best practices focuses on some typical problems data problems that will be uncovered using a Data Profiling tool and those which can be resolved as part of the Data Re-Engineering process.

Techniques that should be applied across all data investigation and re-engineering problems:

Techniques that should be applied for a specific problem:

All techniques can be applied logically to the process of Data Investigation and Re-Engineering and are not specific to a tool

Deliverable Templates

Project Examples

  • Data Quality Management Business Case is performed by organizations planning to embark on a Data Quality Improvement and Reengineering initiative. This summarizes the cost/benefits of conducting a data quality program

Other Resources

Relationships to other Solution Offerings

There is significant overlap between the Data Quality Improvement Solution Offering and two other Solution Offerings:

  • The SAFE Architecture provides a target architecture model at the conceptual level as well as best practice solution architecture options

For most solution offerings, fixing data quality is a major issue. Therefore, most Core Solution Offerings are dependent on this offering and incorporate many of the techniques as part of the delivery process.

Extending the Open Methodology through Solution Offerings

The scope of Core MIKE2.0 Methodology appropriately covers all the activities for Data Investigation and Re-Engineering but may be extended to better cover areas such as unstructured content. Possible extensions are in the Information Governance Offering.

Wiki Contributors
Collapse Expand Close

View more contributors