Personal tools

Partners

Data Mart Consolidation Techniques

From MIKE2 Methodology

Jump to: navigation, search

Data Mart Consolidation opportunities are an integral part of most Data Migration/Consolidation efforts. This article lists out some of the business benefits for consolidation, complexity of the issue and a standards-based approach that can be followed.

Contents

Overview

Data Mart Consolidation should involve more than a technology replacement, it is an opportunity to deliver new functionality to the business during a period of large-scale technology change. Information environment are often very complex due a proliferation of data marts, source systems and complex integration processes. A standards-based approach can be helping in reducing this complexity and executing on the transition.

Deriving Optimal Business Benefit from Data Mart Consolidation

[[Image:data_mart_consolidation.jpg|thumb|right|500px|Optimal Business Benefit from Data Mart Consolidation] IT Transformation programmes are often about reducing cost; Data Consolidation typically provides an excellent opportunity to do this. An AMR Research Report [1] states that consolidation makes sense for many reasons:

  • The cost to maintain a data mart is between $1M and $2M. These costs include multiple Extraction, Transformation, and Loading (ETL) processes, software licenses and maintenance, storage and server hardware, and personnel.
  • Estimates 35% and 70% of these costs are redundant across data marts.
  • An Enterprise Data Warehouse (EDW) effort, with a project value of $4M to $6M, can eliminate those redundant costs.

Complex consolidation programmes can take a long time to implement, however. The XBR Strategy of MIKE2.0 advocates an approach where the real value in consolidation comes with new business capabilities being delivered in addition to cost savings.

The idea that new business capabilities will be required makes sense. Many of the Data Marts would have been delivered a long time ago so the business will typically have identified requirements for new functionality. By enabling the business to do things that it couldn’t do before, they become much more engaged in Transformation programme. This may result in a significant re-architecting of existing solution. The approach should generally not be to re-host legacy functionality

The Complexity of the Enterprise Environment Environment

A typically complex Analytical Information Management
A typically complex Analytical Information Management

The Source Systems have similar or identical feeds to multiple data marts through known and shadow processes. Typically the documentation is suspect and the shadow processes are basically unknown. The source systems are likely to be a mixture of custom legacy and current proprietary packages (e.g., Siebel, SAP etc.). Their capability to recognise data change in the context of business events is quite varied. In the context of data ownership and quality the source systems along with the multiple feeds containing the same logical data attributes need a rationalisation.

The Known/Shadow Processes focus on implementing business rules, attribute transformations and a variety of summarisations. The known processes may have up to date documentation but in most cases not. The shadow processes are only know to a special few if they are still around and little, if any, documentation exists.

The Information Platforms may be thought of as Data Warehouses while others are more departmental in nature and are thought of as Data Marts. They get most of their data from a variety of sources systems via known and shadow processes.

Derived Data Marts obtain the majority of their information from secondary sources (e.g., other data marts) and some of their from source systems. Usually data that is not currently being collected. Since these Marts are a number of steps away from the source systems and the ‘primary’ data marts not much is known about them even to the extent that they are known at all.

The source systems are re-used by derived data marts to acquire data that was not captured initially. These activities are usually associated with shadow processes so not much is known about these activities.

These processes are typically known only to an individual or small group. Often time the author of the process is no longer available and they are just run ‘as is’. Short of reading the ‘code’ there is little information or metadata documenting the processes and their business rules.

A Standards-Based Approach to Data Mart Consolidation

Listed below are the key steps that should be followed for Data Mart Consolidation:

Strategic Business Requirements

The strategy begins with identifying a particular area of the business to begin the consolidation process. Scoping should be focused on constraining the consolidation to a definable business area and the area should be one in which there is general agreement that value would be brought to the business.

For example, the scope might be defined to include Sales and Marketing from the revenue perspective. Typically, this brings value and the information is likely to be available. There might also be a desire to bring in costs and establish net revenue -- however, cost information is not as easy to assemble at the level of granularity needed and would be better put off until a later implementation

Current State Assessment of Source Systems and Feeds

The focus of these activities is to inventory (e.g., list) the source systems and the downstream feeds to data marts. During this process some weeding out should be done if systems or feeds are not relevant for the initial pilot implementation. However, the information should be saved for further reference and not discovered again.

Current State Assessment of Data Marts and Supported Processes

The focus of these activities is to inventory the data marts and the processes they support. Even if it a data mart is a reporting and query data mart there is an inventory of business processes and their associated decisions. This information is key in establishing priorities and new or enhanced functionality.

Data Investigation

Data Investigation (Data Profiling) involves the quantitative analysis of data in the source systems to better understand quality issues, business rules and mapping rules for loading data into the target system. Conducting Data Investigation early in the programme is an important part of reducing the risk of delivery failures.

Create the Common Data Elements

Use a chosen source(s) for a set of Common Data Elements (CDEs. A likely candidate source would be the attributes in the logical data model to be used in the new DW repository. A common LDM across all the repositories will NOT be needed -- only a common (standard) set of attributes as metadata.

Map Common Data Elements to a reference Logical Data Model

The CDEs must be mapped to a reference ERD. This becomes a standard reference. It is desirable to use this ERD as the basis for the new DW repository. This is not always possible because the new DW may be fashioned for one that already exists. In that case the new DW repository must also be mapped to the standard reference CDEs and reference ERD. This represents standard metadata used by the Information Management Center of Excellence.

Map Common Data Elements to a Business LDM

The CDEs must be mapped to a reference Business LDM. This LDM is business facing and defines data concepts like customer segmentation, customer value, customer profile etc. in terms of attributes, facts and metrics. The data concepts are then mapped to the business processes to be supported (i.e., campaign management, churn, winback). MIS applications are then built from these mapping rules.

Data Integration Logical Design

Based upon the required business timing and the capability to capture events as well as data changes the choice between event or batch-oriented technologies would be made. Business Rules and data transformation will be done in a way as uncoupled from the data integration process as much as possible.

Data Re-Engineering

Data Quality Improvement typically involves use of a Data Quality engine. The engine can measure data quality and implement known business rules to repair the data under certain circumstances. Data Re-Engineering should follow the 80/20 rule and this process may also be operationalised.

Execute Pilot Project

The pilot project will cover rationalising Source Systems and Feeds to Common Data Elements. The pilot should be comprehensive enough to do a real business problem as the initial implementation

For example, the initial project might be one in which the goal is to support campaign management. This would be direct support to the Sales and Marketing function. The data attributes to support sales and marketing needs would define the data sources and data feeds to be examined.

The operational data sourcing processes will be made up of batch-oriented and event-based integration. It will include Data Synchronisation with respect to the Data Marts still left operational. The master data platform used in the synchronisation would be the new DW repository. Data Marts can be taken off line as the existing and new functionality becomes available from the new DW repository.

References

  1. Five High-Value Infrastructure Projects for the 2003 Budget, AMR Research Report, September 2002)
Powered by omCollab