Open Framework, Information Management Strategy & Collaborative Governance | Data & Social Methodology - MIKE2.0 Methodology
Wiki Home
Collapse Expand Close

Members
Collapse Expand Close

To join, please contact us.

Improve MIKE 2.0
Collapse Expand Close
Need somewhere to start? How about the most wanted pages; or the pages we know need more work; or even the stub that somebody else has started, but hasn't been able to finish. Or create a ticket for any issues you have found.

Data Integration Solution

From MIKE2.0 Methodology

Share/Save/Bookmark
Jump to: navigation, search
Hv4.jpg This Solution Offering is provided through the MIKE2.0 Methodology. It receives full coverage within the Overall Implementation Guide and SAFE Architecture and contains a number of Supporting Assets. It has been defined inline with the appropriate Solution Offering Creation Guide and has been peer reviewed. It may have some minor issues with Activities or lack depth. In summary, the Solution Offering can be used but it still may have some issues.

Contents

Introduction

Executive Summary

The majority of time spent in Data Warehouse, Data Migration and Data Convergence projects are spent in the design, development and testing of the integration solution. Therefore, having the right approach to Data Integration is critical to delivering a successful information management programme. Having the ‘right’ approach is a challenge as this is an area that has undergone significant change in the past 5 years.

Major changes include:

  • The emergence of vendor suites specifically focused on data integration
  • A growing need for real-time integration and reusable integration capabilities through Services Oriented Architectures
  • Convergence around Application Integration and Data Integration
  • The need to support model-driven integration and metadata-driven integration requirements
  • A greater focus from the business regarding the ability of the integration solution to resolve data quality issues, manage metadata and handle higher data volumes, whist improving upon speed-of-delivery

This MIKE2.0 Solution Guide helps provide a structured mechanism for conducting more traditional integration projects as well as those that involve the use of emerging technologies.

Solution Offering Purpose

This is a Foundational Solution . Foundational Solutions are "background" solutions that support to Core Solution Offerings of the MIKE2.0 Methodology.

Foundational Solutions are the lowest-assets assets within MIKE2.0 that are comprehensive in nature. They may tie together multiple Supporting Assets and are referenced from the Overall Implementation Guide and other Solution Offerings.

Solution Offering Relationship Overview

The Data Integration Solution Offering is a Foundational Solution of MIKE2.0
The MIKE2.0 Data Integration Solution describes how the Activities and Supporting Assets of the MIKE2.0 Methodology can be used to deliver a better environment for Reporting and Analytics.

MIKE2.0 Solutions provide a detailed and holistic way of addressing specific problems. MIKE2.0 Solutions can be mapped directly to the Phases and Activities of the MIKE2.0 Overall Implementation Guide, providing additional content to help understand the overall approach.

The MIKE2.0 Overall Implementation Guide explains the relationships between the Phases, Activities and Tasks of the overall methodology as well as how the Supporting Assets tie to the overall methodology and MIKE2.0 Solutions.

Users of the MIKE2.0 Methodology should always start with the Overall Implementation Guide and the MIKE2.0 Usage Model as a starting point for projects.

Solution Offering Definition

The core set of Activities and Tasks for ETL Conceptual, Logical and Physical Design are listed within the Overall Implementation Guide; The Data_Integration_Solution brings together supporting detail such as:

  • Overall design approach for ETL integration
  • ETL Architecture and Environment Standards
  • ETL Naming Conventions
  • ETL Design Patterns
  • Product-specific standards and design processes
  • Differentiating features for product selection
  • Recommended team structures and cost models for Data Integration

Additional Supporting Assets for data integration include a number of tools and technique papers, deliverables templates, software assets and sample work from project implementations. Some of the tasks related to ETL Logical and Physical Design referenced a leading guide on this subject [1] but have been significantly modified as part of the MIKE2.0 Methodology.

Relationship to Solution Capabilities

This section outlines the MIKE2.0 Solution for Data Integration, which is one of the most critical aspects to conducting projects related to Data Cleansing, Data Migration, Data Warehousing and IT Transformation fits in with the key aspects of the overall MIKE2.0 methodology. It also describes the major deliverables of the ETL Methodology, how they relate to one another, and how they relate to MIKE2.0.

Relationship to Enterprise Views

In terms of the governing model on Enterprise Views, this Solution relates mostly to the Process View for Infrastructure Development by providing a detailed approach for conducting a Data Integration project. This guide is used in conjunction with the overall steps for an information management programme as part of the MIKE2.0 Methodology.

This Solution also makes some reference to the Technology View for Infrastructure Development by providing a brief overview on available vendor technologies in this space; the overview on vendor technologies can be used to either assist in the selection process for a new tool or to confirm that an existing toolset has the required capabilities for a project.

Mapping to the Information Governance Framework

Mapping to the SAFE Architecture Framework

Data Integration is one of the Foundation Capabilities of Infrastructure Development. It provides a mechanism for bringing together information from a number of distributed systems by interfacing into sources, providing a capability to transform data between the systems, enforcing business rules and being able to load data into a different types of target areas.

Mapping to the Overall Implementation Guide

There are a number of aspects that make up the Data Integration Solution. It focused on the aspects of Data Integration that take place across Phases 3 and 4 of the Overall Implementation Guide, in particular:

Business Assessment and Strategy Definition Blueprint (Phase 1)

Within Phase 1 of MIKE2.0, time will be spent on defining the overall business strategy and definition of an overall Conceptual Architecture to define a relatively high-level vision for developing the envisaged Future-State. The initial high level system mappings and the need for data integration between systems (possibly using an ETL tool) will be formulated within this phase, which is explained in the Overall Implementation Guide.

Technology Assessment and Selection Blueprint (Phase 2

During Phase 2 of MIKE2.0, the technology requirements are established at the level to determine whether a vendor product will be used to fulfil the ETL process. As part of vendor selection, a detailed process occurs regarding definition of functional and non-functional requirements in order to select vendor products. The overall SDLC strategy (standards, testing and development environments) that will support ETL development are put in place during this phase. This is explained in the Overall Implementation Guide.

Roadmap and Foundation Activities (Phase 3)

The Roadmap and Foundation Activities include the key planning the steps for development, initial design and establishment of the environment. Related tasks as part of Phase 3 of the MIKE2.0 Methodology include information modelling, data profiling and data re-engineering.

ETL Conceptual Design as part of the Roadmap/Foundation Activities
Detailed Preparation and Planning (part of the Roadmap)

The preparation and detailed planning that takes place during the Roadmap includes examining the documents which have been prepared as inputs to the ETL design process and ensures that they contain sufficient and accurate information. Also within this phase, the project plan for the specific increment is created that contains a detailed list of tasks to be accomplished during ETL design, estimates for those tasks and dependencies among the tasks. The steps for planning are described in the Overall Implementation Guide.

Establishing the ETL Environment (part of Foundation Activities)

The ETL environment is established across Phases 2 and 3 of MIKE2 (specific revisions for the increment are made during Phase 3). The Technical Architecture will have specified hardware and software for Development, Testing, Production and these must be examined, corrected and/or enhanced as required. Development standards, software migration procedures and security measures for the SDLC are also defined.

ETL Conceptual Design/Solution Architecture (Part of Foundation Activities)

A Conceptual Design is then created to outline the scope of the design and support the estimation. This design may be effectively “throw-away” but is used to formulate the initial steps in the design process and feeds into the logical design. This document presents the philosophy of the ETL Conceptual design; the MIKE2 Overview Guide presents the ETL Conceptual Design task within the Solution Architecture.

Prototype of the Solution Architecture (Part of Foundation Activities)

After the Conceptual Design, a prototype of the Solution Architecture is built to gain a better understanding of how the solution will work before moving into a more formalized design process. Prototyping the proposed solution should provide an end-to-end approach that includes each of the major components of the architecture. The prototyping effort is focused on what are perceived as the major technology risk areas for the increment.

This Activity for prototyping progressively builds on the functionality in the prior steps to build a more robust solution although the end result is still a “thin-thread” approach. It is most relevant for complex integration projects, projects where there are many unknowns or for initial increments in an implementation where concepts have not yet been tested.

Test Planning (Part of Foundation Activities)

Test Planning creates test plans and schedules to execute must test cycles. These test cycles will be used to confirm that the system designed for ETL integration is correct, reliable and performs adequately.

Design Increment (Phase 4)

The Design Increment Activities involve Logical and Physical Design of the ETL processes. Test Design occurs in parallel with Software Design.

ETL Logical and Physical Design as part of the Design Increment Phase
ETL Logical Design

The Logical Design phase converts the input documents which are primarily business-oriented into a logical definition of what is to be designed. Some technical issues are investigated, but detailed design is deferred to the next phase. The Logical Design can include definitions of any required staging area, the overall process flow, source and target interfaces, load dependencies and integration with metadata processes. Source and Target Data Models are the key source of input.

A metadata repository is created or enhanced which includes source-to-target mapping, transformations and other important metadata regarding the ETL design.

The ETL Logical Design is technology independent and is complemented by the ETL Physical Design which defines the implementation using a specific vendor integration technology.

ETL Physical Design

The Physical Design phase creates the actual design which leads directly to development. It describes in full detail the processes to be created and the best-practice standards and guidelines which are to be applied to the project.

Standards and guidelines are not merely copied from another source but are examined, adapted and enhanced to consider project-specific and vendor-specific issues.

Services Oriented Architecture Design

If the ETL Conceptual Design/Solution Architecture mandates an approach for using a Services Oriented Architecture, the ETL design process will included tasks that are specifically associated with this specific approach to designing loosely coupled, reusable integration solutions.

Test Design

Test Case Design is done in parallel with Software Design. Some of the Test Case Design can begin during Logical Design with the collection of acceptance test criteria, but most of the test plans can begin only during the Physical Design phase as the elements of the design become known.

Develop, Test & Deploy Increment (Phase 5)

ETL Software Development

MIKE2.0 presents the different aspects of ETL development that occur, as well as the related development efforts such as that of the metadata repository, data re-engineering and development of the target data model. Supporting assets of MIKE2.0 include code samples that complement the ETL Design Patterns that are shown in this document.

ETL Software Testing (Functional Testing, SIT Testing, System Testing, SVT, UAT, PVT

There are multiple testing phases for ETL integration. Depending on the type of project (i.e. Data Warehouse or Data Migration), different test cycles take a greater focus. The overall MIKE2 Methodology explains the testing execution process, along with its work products and dependencies.

Product Documentation and Administration

Turnover to operations staff includes the development of administration and user guides, an operations training plan and handover sessions. The process for developing the administration and training guides for the ETL environment is described in the Overall Implementation Guide.

Mapping to Supporting Assets

Logical Architecture, Design and Development Best Practices

The following implementation techniques are product-independent and can be used in conjunction with the Overall Implementation Guide.

There are two different types of implementation techniques as part of MIKE2.0.

Techniques that should be applied across all data integration problems:

Techniques that should be applied for a specific problem:

Product-Specific Implementation Techniques

Product-specific techniques provide examples of how to conduct ETL integration with specific products. It complements the prior section on logical design patterns. This section will be built out over time, offering comparison between vendor tools, focusing on known issues and strengths of different products.

Selecting Products for Data Integration

The strategic process of selecting products for ETL Data Integration will involve a number of steps and take as input a variety of factors. The overall process is outlined in Phase 2 of the Overall Implementation Guide, with reference to the SAFE Architecture. Additonal questions to consider include:

  • Why would we choose an off-the-shelf tool for ETL Integration?
  • What are some of the key criteria we should use to evaluate ETL tools?

The capabilities of leading vendor tools are always changing. Vendor products should be re-evaluated on key criteria frequently as part of a proper due diligence process for product selection.

ETL Tools: Buy vs. Build

The use of an ETL tools is recommended for complex data integration projects, including data migrations, warehouses and system consolidations. Some of the advantages of an ETL tool include:

  • Providing a means to handle complex data integration processes through the use of a process-integration based architecture
  • Significantly improve audit and security capabilities by automatically tracking data remediation, auditing and interface activities.
  • Leverage highly flexible development functions, which provide the ability to rapidly react to data remediation and reporting requirements.
  • Ability to improve the utilisation of existing resources by improving review and QA processes through the user friendly Graphical User Interface (GUI) of the toolset.
  • Reduce negative system resource impacts on production systems by moving processor intensive tasks off the mainframe and onto a separate ETL server and by producing faster and more efficient system processes.
  • Ability to generate job reporting functions to show the capability of the existing environment. Self-documented jobs reduce the need for administrative documentation.

More generally, benefits include:

  • Significant savings in terms of resource cost and effort required to perform otherwise manual data analysis, assessment, investigation, validation and remediation activities as part of the Data Integration Project.
  • Drastic savings in time can be realised as the software development life-cycle is reduced via the use of tools which promote a graphical user interface (GUI) approach to software development.
  • Use of a single platform reduces complexity in the data management environment for both development and ongoing operations. All jobs run within a single engine, as opposed to disjointed scripts.
  • As the tools being used are being provided by a vendor, external expertise can be gained to supplement the skills of the project team and operations staff. Skills can also be found in the market with direct product experience. Understanding a vendor’s presence in a local market and the skills that exist should be critical evaluation criteria.
  • Convergent solutions that feed directly into a metadata repository are now becoming more widely available from vendors. This greatly simplifies one of the most complex areas of information management.

The above factors mean that off-the-shelf products for ETL Data Integration often provide significant benefits. The most significant benefits come when this functionality (and cost) can be shared across the Enterprise. This a key message of the Information Management Center of Excellence Solution Offering.

Vendor Selection QuickScan

The MIKE2.0 Methodology uses a tool called Technology Selection QuickScan to help with product selection. The tool provides a number of key criteria from which users can match functional, non-functional and commercial requirements to compare vendor products.

Relationships to other Solution Offerings

Extending the Open Methodology through Solution Offerings

References

  1. Data Warehouse: From Architecture to Implementation, Barry Devlin (Addison-Wesley Professional, 1996).
Wiki Contributors
Collapse Expand Close