Data Quality Improvement Solution Offering
From MIKE2.0 Methodology
-> You are here: Data Quality Improvement Solution Offering
The Executive Overview on Data Quality Improvement provides an introductory presentation on this Solution Offering.
The most progressive businesses regard information as much of an enterprise asset as financial and material assets. Having the right approach to information management will enable optimal strategic and operational decisions, for which having accurate, timely and complete information is essential. It will also facilitate flexibility for the future: as application packages come and go and integration technologies change, the data largely stays the same. Therefore, an organisation’s approach to information management is critical to its success. In terms of defining a successful strategy in this area, consider some of the statistics:
Data Quality issues are Hidden and Persistent
Data quality issues can exist unnoticed for some time, although some users may suspect the data in the systems they rely on to make their decisions is not accurate, complete, current, valid, or consistent. Data quality issues tend to leave business users, knowledge workers, and IT executives with a common set of questions in the back of their mind:
Data Quality is Fit for Purpose
It is difficult for users of downstream systems to improve the data quality of their system because the data they derive information from is entered via customer facing operational systems. These customer facing system operators do not have the same incentive to maintain high data quality and they are focused on entering the data quickly and without rejection by the system at the point of entry.
Data quality issues become more visible when data is transferred from one system, a source system, and loaded into another system, a target system, in particular, when the data is integrated. Common source systems include operational, CRM, and ERP systems. The most common target system is a data warehouse used for MIS, DSS, or data mining reporting and analysis.
The difference in purpose of source system data, and target system data is significant. Operational source systems only need to store transactional level data relating to a single system. The purpose of data warehouse data is to provide a complete domain description of an organization, or organizational unit for an extended period. This difference in data purpose promotes different required levels and measures of data quality for source, and target systems.
Source system data quality issues within a single system do not normally prevent the system from operating. For a data warehouse though, data quality issues can reduce the ability to make decisions based on data warehouse data. Data quality of data warehouse systems is critical to their acceptance and use by knowledge workers. Poor data warehouse quality is often cited as the most common cause for data warehouse project failure. Data Quality does not necessarily imply 100% error free data. The requirement is that it meets the needs of the people who are using it. In addition to basic data field errors, other factors contribute to data quality as well:
A Data Investigation programme is a key aspect to getting started
We believe that this approach is required to accurately cost and schedule any data transition or consolidation project as it minimises the risks often caused by data quality issues that result in fire-flights in the 11TH hour. This is the most expensive and worst time to discover problems with the data, as more often than not it results in project delays as it is late in the process and there is little time to do the analysis work that is required to fix the problem.
Data Quality issues are the major impediment to success in a variety of project implementations. Despite this, many organisations get started off on the wrong foot in their information management programmes, by designing solutions without understanding the issues that they face. Identify issues through Data Profiling allows organisations to deal with data quality issues early in the programme, thereby addressing quality problems from the start. Data Profiling also provides the basis for making quantitative decisions around the solution implementation, and provides analysis work that is independent of any technology decisions.
Data quality is contextual to an individual company. Generic approaches of profiling should not be done without understanding the business expectations of the data. Issues surfaced during profiling should not be addressed without this understanding either.
Governance is needed to Prevent Quality Issues
Dispute the tremendous cost of issues, most organisations are struggling to addresses their Data Quality issue as they are stuck in a state where they are also fixing issues, but not preventing them. A Data Governance programme is used to stop Data Quality issues from occurring in the first place. It includes standards, policies, processes, delivery best practices, organisational efficiency and agility.
Solution Offering Purpose
This is a Core Solution Offering. Core Solution Offerings bring together all assets in MIKE2.0 relevant to solving a specific business and technology problem. Many of these assets may already exist and as the suite is built out over time, assets can be progressively added to an Offering.
A Core Solution Offering contains all the elements required to define and deliver a go-to-market offering. It can use a combination of open, shared and private assets.
Solution Offering Relationship Overview
MIKE2.0 Solution Offerings provide a detailed and holistic way of addressing specific problems. MIKE2.0 Solution Offerings can be mapped directly to the Phases and Activities of the MIKE2.0 Overall Implementation Guide, providing additional content to help understand the overall approach. The MIKE2.0 Overall Implementation Guide explains the relationships between the Phases, Activities and Tasks of the overall methodology as well as how the Supporting Assets tie to the overall methodology and MIKE2.0 Solutions. Users of the MIKE2.0 Methodology should always start with the Overall Implementation Guide and the MIKE2.0 Usage Model as a starting point for projects.
Solution Offering Definition
Putting a Data Quality Management programme in place typically requires a comprehensive set of changes to people, process, organisational structure and technology. The goal with data quality management is to avoid data quality issues from occurring in the first place by building an organisation that is focused on the concepts of Information Development. Delivering an overall programme for data quality management can be seen to involve 3 steps:
Relationship to Solution Capabilities
This Solution Offering maps into the Solution Capabilities of MIKE2.0 as described below.
Relationship to Enterprise Views
The MIKE2.0 Solution for Data Quality Improvement covers all areas of Information Development, across people, process, organisation, technology and strategy. Data Quality improvement typically involves the implementation of systematic processes and methods, staff skills development, organisational changes and new technologies. In order to improve data quality across an organisation, enforcing Information Development concepts is crucial.
Mapping to the Information Governance Framework
The Information Governance Solution Offering is required across all Solution Offerings. For Data Quality Improvement, this is particularly important in that governance standards and policies drive requirements for data retention and protection. Changes to these policies can lead to major re-work; policy or process issues can lead to major risks for the business.
MIKE2.0 provides a comprehensive approach for Information Governance that is defined in the Information Governance Solution Offering and refers to this overall approach as "Information Development". We believe that organizations have traditionally not given enough focus to this area and hence face many of the problems that they do today. MIKE2.0 provides an approach to implement a Data Governance programme that is very comprehensive in its scope and is aligned to addressing a number of other business problems which at their core are data management problems.
Experience has shown that the more strategic the data quality initiative the greater the need for a well-defined governance framework. A Data Governance Council is critical for the successful implementation of a Data Quality Improvement programme; an example of how a governance structure should look and the responsibilities it covers can be seen in the figure above.
Measuring Data Quality
Measuring data quality is a key success factor for the long term sustainability of your DQI initiative. Funding for people, process and technology investments must be driven by real Return on Investment. Typically, organisations do not only measure along quantitative dimensions, but also include softer/intangible dimensions to justify their investment and measure success. Defining data quality KPIs requires an interplay of organisational support, governance and accountability, processes, policies and standards, as well as an overall support (either automatically or by analysis from members of the data governance team) by a set of tools. The following diagram outlines an approach to measuring data quality:
Also see Data Governance Metrics.
Mapping to the SAFE Architecture Framework
The architecture framework for Enterprise Information Management is known as SAFE - Strategic Architecture for the Federated Enterprise. The Conceptual Architecture describes the functionality to be delivered by the technology components of the solution to deliver the full set of capabilities required for Enterprise Information Management. As can be seen below, SAFE applies to the Technology steam of the Complete Enterprise View.
Several Foundation Capabilities from the SAFE Conceptual Architecture are required for Data Investigation and Re-Engineering. The development of metadata during the Data Investigation process oftentimes flows through to the standardization, correction, matching and enrichment of data. It is used to determine what data should be extracted from the source environment and loaded into the target staging environment.
As with Data Investigation, the Data Re-Engineering process produces metadata assets that should be stored in a metadata repository that can be shared by users and technologies in the data integration environment. Data Re-Engineering processes may be used with the ETL layer in particular, and becomes part of the overall Services-Oriented Architecture.
We use the output Data Re-Engineering as valuable input for creating a business plan for fixing ongoing data quality issues. This goes into our overall approach around creating the Information Development Environment.
Foundation Capabilities for Information Development are the basic capabilities required to model, investigate and resolve data issues. They cover the modeling of data and metadata as well as the capabilities required for resolving data quality issues. Many of the Information Development capabilities are dependent upon one another and are performed in a process-driven approach for data re-engineering. They should typically be performed in the early stages of project implementations and be used throughout the project lifecycle (including post-deployment).
Mapping to the Overall Implementation Guide
The MIKE2.0 approach for improving Data Quality Improvement goes across all 5 phases of the methodology. The most critical activities for Data Quality Improvement are shown below. These include:
Other Activities are also relevant, but these are particularly focused on Data Governance, Data Investigation and Data Re-engineering efforts.
Business Assessment and Strategy Definition Blueprint (Phase 1)
Within Phase 1 of MIKE2.0, time will be spent on defining the overall business strategy and definition of a strategic Conceptual Architecture to define a relatively high-level vision for developing the envisaged future-state. The interviewing based assessment that uses Information Maturity QuickScan is a key part of this process for building out the vision state and the gap from the current-state environment. The requirements for a profiling tool will be begin to be formulated in this phase, based on the need to make quantitative decisions around the quality of information.
The Organisational QuickScan for Information Development is about trying to quickly understand of the organisation’s current environment for Data Governance and to begin to establish the vision for where it would like to go throughout the programme. This means that some of the key tasks within this Activity involve capturing the current-state set of practices around Data Governance, which are often poorly documented. As MIKE2.0 uses a broad definition of Data Governance, this assessment process involves People, Process, Organisation and Technology. QuickScan assessments are a core part of this activity as they not only provide a rich starter set of questions but also provide maturity guidelines for organisations. The gap between the current-state assessment and the envisioned future-state gives as early indicator of the scope of the overall Data Governance programme.
Data Governance Sponsorship and Scope
In order to conduct a successful Data Governance programme, it is important to have sponsorship at senior levels. Data Governance Sponsorship and Scope is focused on defining what this initial scope will be for improved Data Governance, based on the high-level information requirements and the results of the organisational assessment. This leadership team will play an ongoing role on the project.
Initial Data Governance Organisation
The Initial Data Governance Organisation is focused on establishing the larger Data Governance Organisation. Roles and Responsibilities are established and the overall Organisational structure is formalised. Communications models for Data Governance are also established, which become a critical aspect of issue resolution and prevention further down in the implementation process. The Data Governance Organisation that is established at this point will become more sophisticated over time. The continuous implementation phases of MIKE2.0 (phases 3,4,5) revisit organisational structure for each increment and there are specific improvement activities around moving to an Information Development Organisational model in Phase 5.
Technology Assessment and Selection Blueprint (Phase 2)
During Phase 2 of the Data Quality Improvement process, the technology requirements are established at the level to determine whether a vendor product will be used to fulfill the data investigation process. As part of vendor selection, a detailed process occurs regarding definition of functional and non-functional requirements in order to select vendor products. The overall SDLC strategy (standards, testing and development environments) that will support data profiling are also put in place during this phase. This is explained in the Overall Implementation Guide.
Data Governance Policies are derived from the Policies and Guidelines developed in Phase 1. These high-level policies impact the definition of Data Standards, in particular data security, normalisation and auditing practices.
Data Standards are an important part of Data Governance as standards take complexity out of the implementation process though common language, term definitions and usage guidelines. The standards should be established before the implementation teams begin any detailed work. This will make sure that the team is using a common set of techniques and convention and working within the overall policy framework for Data Governance. As part of an overall Data Governance programme, standards are typically developed for:
Roadmap and Foundation Activities (Phase 3)
The Foundation Activities of MIKE2.0 are arguably the most important aspects of the overall methodology for improving Data Quality. The focus in implementing the Foundation Activities is around those Key Data Elements that are deemed the most crucial to the business.
Information Management Roadmap Overview
The Information Management Roadmap Overview includes the preparation and detailed planning that takes place during this scoping phase by examining the documents which have been prepared as inputs to the design process and ensures that they contain sufficient and accurate information. Data Investigation is typically part of a larger overall project, but in some cases may be a standalone engagement. Also within this phase, the project plan for the specific increment is created that contains a detailed list of tasks to be accomplished during Data Profiling, estimates for those tasks and dependencies among the tasks. The steps for planning are described in the Overall Implementation Guide.
Software Development Readiness
Software Development Readiness in involves establishing the technology environment for Data Quality Improvement. This environment is established across Phases 2 and 3 of MIKE2.0 (specific revisions for the increment are made during Phase 3). The Technical Architecture will have specified hardware and software for Development, Testing, Production and these must be examined, corrected and/or enhanced as required. Development standards, software migration procedures are security measures for the SDLC are also defined.
Business Scope for Improved Data Governance
Definition of the Key Data Elements as part of the Business Scope for Improved Data Governance is a key part of the MIKE2.0 approach to Data Governance. KDEs help focus the work to be done to the most critical data that impacts business users. Data valuation then assigns value to KDEs that are used to prioritize the scope of the Data Governance program. The Data Governance approach focuses primarily on these KDEs for each increment.
Enterprise Information Architecture
Most organisations do not have a well-defined Enterprise Information Architecture. MIKE2.0 takes the approach of building out the Enterprise Information Architecture over time for each new increment that is implemented as part of the overall programme. The scope for building the Enterprise Information Architecture is defined by the in-scope Key Data Elements (KDEs). The Enterprise Information Architecture includes the model to support these KDEs, what systems they resides in, the mastering rules for this data and how often it is to be mastered.
Root Cause Analysis of Data Governance Issues
Preventing Data Governance issues involves analyzing those process activities or application automation that prevents Data Governance issues from occurring in the first place. Root Cause Analysis of Data Governance Issues is concerned with correcting root cause issues as opposed to addressing the symptoms.
Data Governance Metrics
Data Governance Metrics are focused on defined areas to be measured for the KDEs, to assess current performance levels and set targets for improvement. Each KDE is measured against the defined metric category through the appropriate measurement technique.
Data Profiling typically involves conducting column, table and multi-table profiling. This document presents the detail of this process; the Overall Implementation Guide presents the overall set of tasks and how they relate to an overall information management programme. Some aspects of profiling may be done in a fairly automated fashion with a tool, but data investigation will also typically involve manual testing of specific rules.
Data Re-Engineering helps improve Data Governance by dealing with historical Data Quality issues that are typically identified in Data Profiling. MIKE2.0 recommends that Data Re-Engineering follow a serial process of standardisation, correction, matching and enrichment but that this process by conducted iteratively, following the "80/20 rule". This provides a model improving Data Governance is the most cost-effective and expedient fashion.
Develop, Test, Deploy and Improve Activities (Phase 5)
The latter Activities of Phase 5 are focused on Continuous Improvement of the overall Data Governance processes, technology environment and operating model.
Continuous Improvement – Compliance Auditing
Continuous Improvement - Compliance Auditing are conducted by an external group as opposed to the internal Data Governance team. Audits don’t involve the technical aspects of data analysis (i.e. data profiling), but instead involves inspection of results and looking at overall processes for Data Governance.
Continuous Improvement – Standards, Policies and Processes
Continuous Improvement - Standards, Policies and Processes revisits the overall set of standards, metrics, policies and processes for Data Governance. Recommended changes feed into the next increment of work as part of the continuous implementation approach of the MIKE2.0 Methodology.
Continuous Improvement – Data Quality
Continuous Improvement - Data Quality involves identification of root causes and ongoing Data Quality monitoring. This allows a far more proactive approach to Data Governance, whereby organization can either address issues quickly or stop them from occurring altogether.
The Activities listed above are typically associated with a standalone Data Investigation project, which generally completes as one of the Phase 3 Foundation Activities of MIKE2. Other aspects of MIKE2 may include:
The process for developing the administration and training guides for the Data Investigation environment is described in the Overall Implementation Guide.
Mapping to Supporting Assets
Improving Data Quality should go across people, process, organisation and technology. In addition to following the relevant Activities from the Overall Implementation Guide, the following artifacts from MIKE2.0 can be used to assist in this effort:
Tools and Technique Papers
Data Investigation and Re-Engineering Design best practices focuses on some typical problems data problems that will be uncovered using a Data Profiling tool and those which can be resolved as part of the Data Re-Engineering process.
Techniques that should be applied across all data investigation and re-engineering problems:
All techniques can be applied logically to the process of Data Investigation and Re-Engineering and are not specific to a tool
Relationships to other Solution Offerings
There is significant overlap between the Data Quality Improvement Solution Offering and two other Solution Offerings:
For most solution offerings, fixing data quality is a major issue. Therefore, most Core Solution Offerings are dependent on this offering and incorporate many of the techniques as part of the delivery process.
Extending the Open Methodology through Solution Offerings
The scope of Core MIKE2.0 Methodology appropriately covers all the activities for Data Investigation and Re-Engineering but may be extended to better cover areas such as unstructured content. Possible extensions are in the Information Governance Offering.
Wiki asset search