Open Framework, Information Management Strategy & Collaborative Governance | Data & Social Methodology - MIKE2.0 Methodology
Wiki Home
Collapse Expand Close

Members
Collapse Expand Close

To join, please contact us.

Improve MIKE 2.0
Collapse Expand Close
Need somewhere to start? How about the most wanted pages; or the pages we know need more work; or even the stub that somebody else has started, but hasn't been able to finish. Or create a ticket for any issues you have found.

Data Investigation and Re-Engineering Solution Offering

From MIKE2.0 Methodology

Share/Save/Bookmark(Redirected from Data Investigation and Re-Engineering Solution)
Jump to: navigation, search
Hv4.jpg This Solution Offering is provided through the MIKE2.0 Methodology. It receives full coverage within the Overall Implementation Guide and SAFE Architecture and contains a number of Supporting Assets. It has been defined inline with the appropriate Solution Offering Creation Guide and has been peer reviewed. It may have some minor issues with Activities or lack depth. In summary, the Solution Offering can be used but it still may have some issues.

Contents

= Introduction

Executive Summary

The investigation of data issues should be considered a pre-requisite to any significant data integration effort, as it removes the uncertainty and assumptions related to the information environment. As opposed to having to make assumptions, Data Profiling provides a discovery framework for identifying and analyzing data quality issues and making fact-based decisions.

A Data Investigation approach is required to accurately cost and schedule any data transition or consolidation project as it minimizes the risks often caused by data quality issues that result in fire-fights in the 11TH hour. This is the most expensive and worst time to discover problems with the data as, more often than not, it results in project delays. Problems this late in the process allow little time to do the analysis work that is required to fix them. By starting with Data Profiling, it is possible to deal with data quality issues early in the programme. Data Profiling also provides the basis for making quantitative decisions around the solution implementation, and provides analysis work that is independent of any technology decisions.

Once Data Quality issues are known, the approach to fixing them must be systematic. A Data Re-Engineering process should be applied to focus on addressing the most important issues first and then moving on to higher risk areas later.

In summary, Data Quality issues are the major impediment to success in a variety of project implementations. Despite this, many organisations get started off on the wrong foot in their information management programmes, by designing solutions without understanding the issues that they face.

Solution Offering Purpose

This is a Foundational Solution . Foundational Solutions are "background" solutions that support to Core Solution Offerings of the MIKE2.0 Methodology.

Foundational Solutions are the lowest-assets assets within MIKE2.0 that are comprehensive in nature. They may tie together multiple Supporting Assets and are referenced from the Overall Implementation Guide and other Solution Offerings.


The Data Investigation and Re-Engineering Solution Offering also forms the key aspects of a Core Solution Offering for Data Quality Improvement. Core Solution Offerings bring together all assets in MIKE2.0 relevant to solving a specific business and technology problem. Many of these assets may already exist and as the suite is built out over time, assets can be progressively added to an Offering.

Solution Offering Relationship Overview

MIKE2.0 Solution Offering for Data Investigation and Re-Engineering within EDM Solution Group

This section describes the MIKE2.0 Data Investigation and Re-Engineering Solution, and is part of the overall methodology. Solutions provide a detailed and holistic way of addressing specific problems. Solutions can be mapped directly to the Phase and Activities of the Overall Implementation Guide, providing additional content to help understand the overall approach.

The MIKE2.0 Overall Implementation Guide explains the relationships between the Phases, Activities and Tasks of the overall methodology as well as how the Supporting Assets tie to the overall Methodology and MIKE2.0 Solutions. Users of the methodology should always start with the Overall Implementation Guide and the MIKE2.0 Usage Model as a starting point for projects. forfait b and you rio bouygues portabilité calcul IMC rio orange

Solution Offering Definition

Putting a Data Quality Management programme in place typically requires a comprehensive set of changes to people, process, organisational structure and technology. The goal with data quality management is to avoid data quality issues from occurring in the first place by building an organisation that is focused on the concepts of Information Development.

Data Investigation and Re-Engineering is used to:

Mike2 Step2-DQI.jpg
  • Implement a Data Quality solution that will assess the data against the identified rules that were captured in Step 1.
  • Define measurement and tracking capabilities of the DQM methodologies.
  • Establish monitoring activities that review data for completeness and accuracy.
  • Aim to enforce the use of standards and support ongoing data improvement and management processes
  • Define the data items that are candidates for archiving that no longer serve a useful purpose.

The Data Quality Improvement Solution Offering includes these activities as well as those focused on Data Governance to provide a comprehensive solution approach.

Data Investigation Provides the Foundation for Improvement

As part of the the MIKE2.0 Methodology, Data Investigation is seen to be critical to any implementation workstream that is centered on data. We have extended the traditional definition of “data projects” to include those around IT Strategy and IT Transformation, as many of these projects fail because they take solely an application-focused approach and are subsequently held back due to data issues in the legacy environment.

Data Investigation therefore provides a cornerstone approach for initiating projects related to Data Cleansing, Data Migration, Data Warehousing, IT Transformation and IT Strategy. In all cases, much of the investigation process is the same – it is only the application of this investigation process against the business goals that is different.

Data Re-Engineering is a Progressive Improvement Process

Data Re-Engineering is a term used to describe a number of related functions: standardising data to a common format, correcting data quality issues, remove duplication information or building linkages between records that did not exist previously, or enriching data with supplementary information.

The key determinant for the success of System Migration, Consolidation or Warehousing projects is often the solution and implementation approach for dealing with Data Quality issues; therefore, Data Re-Engineering (like Data Investigation) provides a cornerstone approach for a number of different types of projects. This section describes the key steps in the Data Re-Engineering processes as well as how they related to periphery processes around ETL, Metadata Management and Data Investigation.

Data Investigation Relationship to Solution Capabilities

The core set of Activities and Tasks for Data Investigation are listed within MIKE2 Overall Implementation Guide; this Solution focuses on providing additional detail in the following areas:

  • The MIKE2.0 interviewing-based Data Investigation process and an introduction to the tools used to support this process
  • Overall approach for conducting data profiling
  • Data Investigation and Re-Engineering Design Patterns
  • Differentiating features for product selection
  • Recommended team structures and cost models

Additional Supporting Assets for Data Investigation include a number of tools , technique papers, deliverable templates and project examples. The different types of supporting assets are described in the “Deliverables” section of the MIKE2.0 Overall Implementation Guide.

Relationship to Enterprise Views

Data Investigation is one of the Foundation Capabilities for Information Development. It provides a crucial first step in Data Integration projects by helping to build an understanding of data issues before addressing data quality problems or integrating data between systems. Data Investigation can also be used in an ongoing monitoring fashion to assess data issues over time and is a key enabler for Information Development.

This Solution for Data Investigation also makes some reference to the Technology View for Information Development by providing a brief overview on available vendor technologies in this space; the overview on vendor technologies can be used to either assist in the selection process for a new tool or to confirm that an existing toolset has the required capabilities for a project.

Mapping to the Information Governance Framework

This Solution Offering is an important enabler for Information Governance although the more comprehensive inclusive of Information Governance to improve data quality is included within the Data Quality Improvement Solution Offering.

Mapping to the SAFE Architecture Framework

Safe focus data investigation.jpg

Data Investigation is one of the Foundation Capabilities for Information Development. It provides a crucial first step in Data Integration projects by helping to build an understanding of data issues before addressing data quality problems or integrating data between systems. Data Investigation can also be used in an ongoing monitoring fashion to assess data issues over time. This section describes the core capabilities of Data Investigation and their role within the Conceptual Architecture.

Mapping to the Overall Implementation Guide

Data Investigation takes place at multiple points during the MIKE2 Methodology:

  • The Interviewing Based Assessment takes place during Phase 1, the Business Assessment and Strategy Blueprint.
  • Data Profiling is a Foundation Activity that takes place during Phase 3, the Roadmap.
  • In the event that Data Profiling is operationalised into an ongoing Data Monitoring capability, it will take place in Phase 5 of the MIKE2 Methodology, after the system has been deployed.
Initial Data Investigation primarily takes place during phases 1 and 3 of MIKE2

Business Assessment and Strategy Definition Blueprint (Phase 1)

Within Phase 1 of MIKE2.0, time will be spent on defining the overall business strategy and definition of an strategic Conceptual Architecture to define a relatively high-level vision for developing the envisaged future-state. The interviewing based assessment that uses Information Maturity QuickScan is a key part of this process for building out the vision state and the gap from the current-state environment. The requirements for a profiling tool will be begin to be formulated in this phase, based on the need to make quantitative decisions around the quality of information.

Technology Assessment and Selection Blueprint (Phase 2)

During Phase 2 of MIKE2.0, the technology requirements are established at the level to determine whether a vendor product will be used to fulfil the data investigation process. As part of vendor selection, a detailed process occurs regarding definition of functional and non-functional requirements in order to select vendor products. The overall SDLC strategy (standards, testing and development environments) that will support data profiling are also put in place during this phase. This is explained in the MIKE2 Overall Implementation Guide.

Roadmap and Foundation Activities (Phase 3)

Detailed Preparation and Planning (part of the Roadmap)

The preparation and detailed planning that takes place during the Roadmap includes examining the documents which have been prepared as inputs to the design process and ensures that they contain sufficient and accurate information. Data Investigation is typically part of a larger overall project, but in some cases may be a standalone engagement. Also within this phase, the project plan for the specific increment is created that contains a detailed list of tasks to be accomplished during Data Profiling, estimates for those tasks and dependencies among the tasks. The steps for planning are described in the Overall Implementation Guide.

Establishing the Data Profiling Environment (part of Foundation Activities)

The data profiling environment is established across Phases 2 and 3 of MIKE2.0 (specific revisions for the increment are made during Phase 3). The Technical Architecture will have specified hardware and software for Development, Testing, Production and these must be examined, corrected and/or enhanced as required. Development standards, software migration procedures are security measures for the SDLC are also defined.

Data Profiling Execution (Part of Foundation Activities)

Data Profiling is typically involves conducting column, table and multi-table profiling. This document presents the detail of this process; the Overall Implementation Guide presents the overall set of tasks and how they relate to an overall information management programme.

Some aspects of profiling may be done in a fairly automated fashion with a tool, but data investigation will also typically involve manual testing of specific rules.

Other Activities

The Activities listed above are typically associated with a standalone Data Investigation project, which generally completes as one of the Phase 3 Foundation Activities of MIKE2. Other aspects of MIKE2 may include:

  • Test Planning/Design of the environment or for specific investigation rules (especially those that will be operationalised for monitoring)
  • Operationalisation of data profiling processes for ongoing monitoring
  • Development of product documentation and administration.

The process for developing the administration and training guides for the Data Investigation environment is described in the MIKE2 Overall Implementation Guide.

Data Re-Engineering Relationship to Solution Capabilities

The MIKE2.0 approach for Data Re-Engineering assumes a standalone process for a project focused more on Data Cleansing or defining match relationships. In the event these processes are used for Data Integration, other processes will be required.

Relationship to Enterprise Views

Data Re-engineering is one of the Foundation Capabilities for Information Development. In terms of the governing model on Enterprise Views, this MIKE2 Solution for Data Re-Engineering relates mostly to the Process View for Infrastructure Development by providing a detailed approach for conducting a Data Integration project. This guide is used in conjunction with the overall steps for an information management programme as part of the MIKE2 Methodology.

This Solution for Data Re-Engineering also makes some reference to the Technology View for Information Development by providing a brief overview on available vendor technologies in this space; the overview on vendor technologies can be used to either assist in the selection process for a new tool or to confirm that an existing toolset has the required capabilities for a project.

Mapping to the Information Governance Framework

This Solution Offering is an important enabler for Information Governance although the more comprehensive inclusive of Information Governance to improve data quality is included within the Data Quality Improvement Solution Offering.

Mapping to the SAFE Architecture Framework

Several Foundation Capabilities from the SAFE Architecture re required for Data Re-Engineering. The standardisation, correction, matching and enrichment of data is often a direct follow-on from the Data Profiling process. It is used to determine what data should be extracted from the source environment and loaded into the target staging environment

As with Data Profiling, the Data Re-Engineering process produces metadata assets that should be stored in a metadata repository that can be shared by users and technologies in the data integration environment. Data Re-Engineering processes may be used with the ETL layer in particular, and becomes part of the overall Services-Oriented Architecture.

We use the output Data Re-Engineering as valuable input for creating a business plan for fixing ongoing data quality issues. This goes into our overall approach around creating the Information Development Environment.

Mapping to the Overall Implementation Guide

Data Re-Engineering is a Foundation Activity that takes place during Phase 3 of the MIKE2 Methodology, the Roadmap. The Data Re-Engineering process may be operationalised and used during the ETL integration process, which takes place during Phase 4 of MIKE2.

The core set of Activities and Tasks for Data Re-Engineering programme are listed within the Overall Implementation Guide; this guide focuses on providing additional detail in the following areas:

  • Overall approach for conducting data re-engineering
  • Data Investigation and Re-Engineering Design Patterns
  • Differentiating features for product selection
  • Recommended team structures and cost models

Additional Supporting Assets for Data Re-Engineering include a number of tools and technique papers, deliverables templates, software assets and sample work from past projects.

Initial Data Re-Engineering takes place primarily during Phase 3 of MIKE2

Business Assessment and Strategy Definition Blueprint (Phase 1)

Within Phase 1 of MIKE2.0, time will be spent on defining the overall business strategy and definition of an overall Conceptual Architecture to define a relatively high-level vision for developing the envisaged Future-State. The interviewing based assessment that uses Information Maturity QuickScan is a key part of this process for building out the vision state and the gap from the current-state environment. The requirements for a profiling tool will be begin to be formulated in this phase, based on the need to make quantitative decisions around the quality of information.

Technology Assessment and Selection Blueprint (Phase 2)

During Phase 2 of MIKE2.0, the technology requirements are established at the level to determine whether a vendor product will be used to fulfil the data investigation process. As part of vendor selection, a detailed process occurs regarding definition of functional and non-functional requirements in order to select vendor products. The overall SDLC strategy (standards, testing and development environments) that will support data profiling are also put in place during this phase. This is explained in the overall MIKE2 implementation guide.

Roadmap and Foundation Activities (Phase 3)

Detailed Preparation and Planning (part of the Roadmap)

The preparation and detailed planning that takes place during the Roadmap includes examining the documents which have been prepared as inputs to the design process and ensures that they contain sufficient and accurate information. Data Investigation is typically part of a larger overall project, but in some cases may be a standalone engagement. Also within this phase, the project plan for the specific increment is created that contains a detailed list of tasks to be accomplished during Data Profiling, estimates for those tasks and dependencies among the tasks. The steps for planning are described in the Overall Implementation Guide.

Establishing the Data Profiling Environment (part of Foundation Activities)

The data profiling environment is established across Phases 2 and 3 of MIKE2.0 (specific revisions for the increment are made during Phase 3). The Technical Architecture will have specified hardware and software for Development, Testing, Production and these must be examined, corrected and/or enhanced as required. Development standards, software migration procedures are security measures for the SDLC are also defined.

Data Profiling Execution (Part of Foundation Activities)

Data Profiling is typically involves conducting column, table and multi-table profiling. This document presents the detail of this process; the Overall Implementation Guide presents the overall set of tasks and how they relate to an overall information management programme.

Other Activities within MIKE2.0

These are typically the aspects of MIKE2.0 that are associated with a standalone Data Investigation project, which generally completes as one of the Phase 3 Foundation Activities of MIKE2.0. Other aspects of MIKE2.0 may include:

  • Test Planning/Design of the environment or for specific investigation rules (especially those that will be operationalised for monitoring)
  • Operationalisation of data profiling processes for ongoing monitoring
  • Development of product documentation and administration.

The process for developing the administration and training guides for the ETL environment is described in the Overall Implementation Guide.

Mapping to Supporting Assets

Logical Architecture, Design and Development Best Practices

Data Investigation and Re-Engineering Design Best practices focuses on some typical problems data problems that will be uncovered using a Data Profiling tool and those which can be resolved as part of the Data Re-Engineering process.

The following implementation techniques are product-independent and can be used in conjunction with the Overall Implementation Guide. There are two different types of implementation techniques as part of MIKE2.0:

Techniques that should be applied across all data investigation and re-engineering problems:

Techniques that should be applied for a specific problem:

All techniques can be applied logically to the process of Data Investigation and Re-Engineering and are not specific to a tool

Product-Specific Implementation Techniques

Listed below are some examples of how to conduct data profiling and re-engineering with specific vendors tools. It complements the prior section on logical design patterns. It is an area to be built out over time, offering comparison between vendor tools, focusing on known issues and strengths of different products. As vendor products are always changing, it is an area that will be updated frequently and will therefore be kept in a dynamic form. Inventory of Data Investigation and Re-Engineering vendor implementation techniques

Selecting Products for Data Investigation

The strategic process of selecting products for ETL Data Investigation and Data Re-Engineering will involve a number of steps and take as input a variety of factors. The overall process is outlined in Phase 2 of the Overall Implementation Guide, with reference to the SAFE Architecture. Additonal questions to consider include:

  • Why would we choose an off-the-shelf tool for Data Investigation and Data Re-Engineering?
  • What are some of the key criteria we should use to evaluate Data Investigation and Data Re-Engineering tools?

The capabilities of leading vendor tools are always changing. Vendor products should be re-evaluated on key criteria frequently as part of a proper due diligence process for product selection.

Data Investigation and Re-Engineering: Buy vs. Build

Data Investigation tools are recommended with a high priority since these tools assist in two of the essential steps in a data quality cycle, to measure data quality levels and to quantify the effect in an ongoing monitoring process. Some of the advantages of a tool include:

  • Automation of data quality measurement over time.
  • Provides ad-hoc data measurement functions that can be introduced for any data entity without additional development work.
  • Provides a means for discovering data quality issues that cannot be discovered in standard SQL investigation.
  • Facilitates the benchmarking and metrics approach towards continual data quality measurement.
  • Ability to generate data quality reports, which facilitate the understanding of DQ issues.

Data Re-Engineering tools can provide a great benefit over custom development, depending on the type of data. Their greatest benefits tend to be around Data Cleansing and Semantic Matching. These benefits include:

  • The ability to standardise free text fields such as product names, addresses and personnel names.
  • The ability to locate potential duplicate data in single or multiple data sets based on ‘fuzzy’ matching algorithms.
  • The ability to remove duplicates via data matching results and de-duplication functions.
  • The ability to build ‘best of breed’ type records based on data from multiple sources.
  • The ability to link disparate records across systems whilst still maintaining original source data

More generally, benefits include:

  • Significant savings in terms of resource cost and effort required to perform otherwise manual data analysis, assessment, investigation, validation and remediation activities as part of the DQ Project.
  • Drastic savings in time can be realised as the software development life-cycle is reduced via the use of tools which promote a graphical user interface (GUI) approach to software development.
  • Use of a single platform reduces complexity in the data management environment for both development and ongoing operations.
  • As the tools being used are being provided by a vendor, external expertise can be gained to supplement the skills of the project team and operations staff. Skills can also be found in the market with direct product experience. Understanding a vendor’s presence in a local market and the skills that exist should be critical evaluation criteria.
  • Convergent solutions that feed directly into a metadata repository are now becoming more widely available from vendors. This greatly simplifies one of the most complex areas of information management.

The above factors mean that off-the-shelf products for Data Investigation and Data Re-Engineering often provide a significant benefits. The most significant benefits come when this functionality (and cost) can be shared across the Enterprise. This a key message of the MIKE2 Solution for building an “Information Development Centre of Excellence”.

Data Investigation Re-Engineering: Tool Selection

MIKE2.0 uses the Technology Selection QuickScan for vendor selection. The tool provides a number of key criteria from which users can match functional, non-functional and commercial requirements to compare vendor products.

Relationships to other Solution Offerings

A number of Core Solution Offerings make use of this foundational Solution Offering. Those forcondierd which it is most important are within the Enterprise Data Management Offering Group.

There is a high degree of overlap between this offering and the Data Quality Improvement Solution Offering; this solution could be considered a subset of the approach for organisations more focused on addressing historical data quality issues. In the future, these solution offerings may be merged.

Extending the Open Methodology through Solution Offerings

The Overall Task List covers all activities required for Data Investigation and Re-Engineering.

Wiki Contributors
Collapse Expand Close

View more contributors