Open Framework, Information Management Strategy & Collaborative Governance | Data & Social Methodology - MIKE2.0 Methodology
Wiki Home
Collapse Expand Close

Members
Collapse Expand Close

To join, please contact us.

Improve MIKE 2.0
Collapse Expand Close
Need somewhere to start? How about the most wanted pages; or the pages we know need more work; or even the stub that somebody else has started, but hasn't been able to finish. Or create a ticket for any issues you have found.

ETL Conceptual Design Deliverable Template

From MIKE2.0 Methodology

Share/Save/Bookmark
Jump to: navigation, search
Under review.png
This article is a stub. It is currently undergoing major changes as it is in the very early stages of development and is only a placeholder. Please help improve MIKE2.0 by adding to this article.
This deliverable template is used to describe a sample of the MIKE2.0 Methodology (typically at a task level). More templates are now being added to MIKE2.0 as this has been a frequently requested aspect of the methodology. Contributors are strongly encouraged to assist in this effort.
Deliverable templates are illustrative as opposed to fully representative. Please help add examples to this template that are representative of the proposed output.

Contents

Overview

The purpose of the ETL Conceptual Design task is to guide the overall solution approach that feeds into the ETL Logical and Physical Design. The purpose of the Conceptual Design is to introduce the areas of significant complexity in the early stages of a project.

Scope

The ETL Conceptual Design is at a high level, but should contain the following:

  • Sketch of the overall flow
  • List of sources
  • Whether a staging area is to be used
  • List of targets
  • Major transformations
  • Volume estimates
Frequency of update (Timing)

Based on the model of BusinessTime it has to be determined what kind of data has to integrated in which period of time without influencing the value of decisions based on it. Because this has significant influence on the conceptual and technical design of the ETL process, this task should be adresses at an early stage. Typical categories of latency are:

  • weekly
  • daily
  • (near) real-time
    The demand for (near) real-time data integration is constantly growing. A hard definition of real-time integration would require the integration of new data within the same transactional context as the recording in an operational system. In the area of analytical systems it is often not necessary to fulfill this requirement to archive a competitive advantage. Nevertheless relevant data has to be available, when it is needed. So the term right time integration is occasionally used.
    The classical batch process for data integration, which usually is performed at night or on weekends, does not fit to (near) real-time requirements.
    Other promising concepts are available to accomplish the necessary tasks:

    • Batch-oriented
      This group of methods uses technologies of data consolidation. Data from multiple source systems is transfered into a target system. A typical member of this group is the so called microbatch. It uses the classical ETL-Batch approach, but the time between scheduled executions is reduced to minutes or hours. User of this method can profit from sophisticated batch tools and well-optimized processes. Main disadvantages can be load peaks on the relevant systems, when microbatches are proceeded.
    • Continuous
      Continuous concepts are based on data propagation technologies. By using a Middleware_Component data and messages can reliably and timely be transfered between systems. As this technical approach was mainly developed for operational systems very low latencies can be reached. Because data is processed in small chunks and not in big batches load peaks are improbable.
    • Event-driven
      These methods are using technologies based on data federation. A special transformation layer (also called mediator) is used to present an integrated, virtual view of the source systems. In opposition to continuous integration event driven concepts access source data on demand only. The data is then integrated and made available to the requester. With this approach the presented data is always current. Latency is depended on the slowest source system which contains requested data. Enterprise Information Integration is a popular member of event-driven integration.




Mention of significant complexities, such as:

  • Slowly-changing dimensions
  • Interface with a data re-engineering tool
  • Very complex transformations

It will also be important to cover the overall approach integration architecture and how it will handle different types of scenarios. The ETL Conceptual Architecture should present the high level strategy related to:

  • extraction increments
  • full data loads
  • frequency
  • data staging
  • bulk load
  • incremental updates
  • purging

Where is fits in the SDLC

The Conceptual Design fits runs in parallel with a number of solution development activities, as show below. ETL Conceptual Design

Wiki Contributors
Collapse Expand Close

View more contributors