Open Framework, Information Management Strategy & Collaborative Governance | Data & Social Methodology - MIKE2.0 Methodology
Wiki Home
Collapse Expand Close

Members
Collapse Expand Close

To join, please contact us.

Improve MIKE 2.0
Collapse Expand Close
Need somewhere to start? How about the most wanted pages; or the pages we know need more work; or even the stub that somebody else has started, but hasn't been able to finish. Or create a ticket for any issues you have found.

Data Modelling Solution

From MIKE2.0 Methodology

Share/Save/Bookmark
Jump to: navigation, search
Hv4.jpg This Solution Offering is provided through the MIKE2.0 Methodology. It receives full coverage within the Overall Implementation Guide and SAFE Architecture and contains a number of Supporting Assets. It has been defined inline with the appropriate Solution Offering Creation Guide and has been peer reviewed. It may have some minor issues with Activities or lack depth. In summary, the Solution Offering can be used but it still may have some issues.

Contents

=Introduction

Executive Summary

Many of today’s problems in data management can be traced back to how the data has been modelled. That said, the fundamental techniques for modelling have undergone less change than just about any technology area since the 1970s. In addition, the core data that is being captured is pretty much the same – in stark contrast from the rapid shifts like we have seen around new functionality and user interfaces. So why do we have so many problems?

First and foremost, most implementations fail to follow established techniques for relational (and its derivative) dimensional modelling. Projects often do a very poor job at the basics: naming conventions, establishing data definitions and defining relations. Many organisations barely "model" at all, but merely implement table structures for the data they need to capture. This then leads to problems around referential integrity, data accuracy and data mastering that are difficult to solve. This is the primary problem.

The other issue is that new complexities have arisen in todays distributed, integrated, high-volume architectures which mean that our modelling requirements are even more complex. We face a major challenge in bringing together applications where data has typically been modelled with without knowledge of the rest of the environment. The process of navigating from one end of the enterprise data model to another has become even more complex – far too complex compared with other systems. This latter point highlights a potential gap in the techniques we are employing are relational modelling and the need for possibly a new approach to how we model data.

Although there are arguably issues, relational data modelling techniques have been in place for over 30 years and are still the best techniques we have for managing information complexity. The MIKE2.0 Solution for Data Modelling is focused on defining best practices in this area.

We also have a more ambitious goal - trying to advance the techniques that we use in this to see if perhaps that are some ways to improve upon our traditional approaches for modelling. We look to us the organising framework and collaborative environment provided by MIKE2.0 to help us in the task.

Solution Offering Purpose

This is a Foundational Solution . Foundational Solutions are "background" solutions that support to Core Solution Offerings of the MIKE2.0 Methodology.

Foundational Solutions are the lowest-assets assets within MIKE2.0 that are comprehensive in nature. They may tie together multiple Supporting Assets and are referenced from the Overall Implementation Guide and other Solution Offerings.

Solution Offering Relationship Overview

The Data Modelling Solution Offering is a Foundational Solution of MIKE2.0

The MIKE2.0 Solution for Data Modelling is part of the overall MIKE2.0 Methodology. MIKE2.0 Solutions provide a detailed and holistic way of addressing specific problems. MIKE2.0 Solutions can be mapped directly to the Phase and Activities of the Overall Implementation Guide, providing additional content to help understand the overall approach.

The Overall Implementation Guide explains the relationships between the Phases, Activities and Tasks of the overall methodology as well as how the Supporting Assets tie to the overall methodology and MIKE2.0 Solutions. Users of the MIKE2.0 Methodology should always start with the Overall Implementation Guide and the Usage Model as a starting point for projects.

The core set of Activities and Tasks for defining a model from Conceptual to Logical to Physical to Implementation are defined within the Overall Implementation Guide. The MIKE2 Solution for Data Modelling brings together lower-level detail around:

  • Developing data standards
  • Differences between logical modelling and physical modelling
  • Differences between relational modelling and dimensional modelling
  • Map-and-gap techniques for using an off-the-shelf models

This area is built out over time and linked into more detailed Supporting Assets that describe these areas. forfait b and you rio b and you portabilité calcul IMC rio orange

Solution Offering Definition

The approach used for modelling data is always one of the key determinants to success on any project as how data is modeled will determine integration complexity, security requirements and ultimately success for the business. That said, this area is often under-resourced on most projects – integration and function usually dominate the team’s efforts.

Relationship to Solution Capabilities

Relationship to Enterprise Views

Data Modelling is a key part of the process for Information Development

In terms of the governing model on Enterprise Views, the MIKE2 Solution for Data Modelling does across

  • Strategy, defining out strategic information requirements as part of an overall strategy programme
  • Process (Governance) – steps to be followed, standards and policies that should be employed
  • Organisation – where data modellers and information architects fit within an Information Development organisation
  • Technology – guidelines for product selection around database platform, modelling tools and off-the-shelf data models
  • People – skills required around data modelling and information architecture

This guide is used in conjunction with the overall steps for an information management programme that are defined in the Overall Implementation Guide.

Mapping to the Information Governance Framework

Mapping to the SAFE Architecture Framework

Data Modelling is a Foundation Capability of the SAFE Architecture

Data Modelling is one of the Foundation Capabilities for Information Development. Data models provide the concepts that store our data and its relations. Data modelling is a Foundation Capability as it is one of the critical early steps for any project; working on other areas without getting the model reasonably stable will inevitably lead to issues further down the track.

Following best practices in data modelling also involves establishing a robust data dictionary. This is the first step in managing metadata across the information environment; basic metadata management is a foundation capability and a pre-requisite to a more sophisticated architecture that involves active metadata integration.

Mapping to the Overall Implementation Guide

Data Modelling primarily takes place during the first 3 phases of MIKE2

There are a number of aspects that make up the MIKE2.0 approach to Data Modelling. This document is focused on the aspects of Data Modelling that take place across multiple phases of the MIKE2 Methodology, in particular:

  • Definition of a target Conceptual Model that defines the strategic business requirements in Phase 1 of MIKE2.
  • The creation of standards around data modelling such as naming conventions, use of sub-types and super-types and use cardinality.
  • The ongoing process for developing an Enterprise Data Model based around the KDEs. This is built in a progressive fashion and is started in Phase 3 of MIKE2.0.
  • Creation of the target Logical and Physical models in Phase 3 of MIKE2.0. This process may involve the use of an off-the-shelf model, development of a bespoke model or a hybrid approach.

The data model is implemented in Phase 4 of MIKE2.0. although the data modelling solution is focused on modelling tasks as opposed to physical implementation.

The manner in which the Overall Implementation Guide relates to Data Modelling is listed below, in relation to the different phases of MIKE2.0.

Business Assessment and Strategy Definition (Phase 1)

Within phase 1 of MIKE2.0, time will be spent on defining the overall business strategy and definition of an overall conceptual architecture to define a relatively high-level vision for developing the envisaged future-state. This conceptual architecture includes a conceptual data model that is aligned with the organisations' high-level information requirements. If Data Standards exist, these should be followed for conceptual modelling. Otherwise, the simple conceptual model will be re-factored during logical modelling. The point of the conceptual model is only to convey major business concepts and scope.

Technology Assessment and Selection Blueprint (Phase 2)

During Phase 2 of MIKE2.0, the technology requirements are established at the level to make strategic technology decisions. Policies and Standards that will be used throughout the implementation programme are also put in place.

Strategic Requirements and Vendor Selection

As part of vendor selection, a detailed process occurs regarding definition of functional and non-functional requirements in order to select vendor products. This may involve selection of modelling tools and database implementation platforms.

It is critical to employ a tools-based approach when modelling data: under-equipping the team in this area from the onset will be more costly to address further on in the project. Not all modelling tools are the same: some are integrated as part of packages; some offer functionality beyond only data modelling, others are much cheaper in price. Diligence should be applied in the selection of modelling, just as it is done with run-time products.

Define Testing Strategy

The Test Strategy should make sure there is an appropriate focus on testing that uses Information Development approach. This means that there are an appropriate amount of test cases focused around areas such as domain values, referential integrity and data access when testing the data model as part of the more comprehensive solution. This Test Strategy then follows down to Test Planning, Test Design and multiple Activities for Test Execution.

Data Policies

Data Policies are derived from the Policies and Guidelines developed in Phase 1. The high-level policies impact of Data Standards, in particular around data security, normalisation and auditing practices.

Data Standards

Data Standards should be established before the modelling team begins any detailed work. This will make sure that the team is working to a common set of techniques and conventions. Data Standards should be straightforward and follow a common set of best practices. Often data standards will already exist that can be leveraged.

Roadmap and Foundation Activities (Phase 3)

Data Modelling is considered a Foundation Activity of MIKE2.0 and therefore a number of activities associated with the modelling process occur in Phase 3. This includes establishing the environment, getting more detailed information requirements and establishing the logical and physical data models that meet these requirements.

Detailed Preparation and Planning (part of the Roadmap)

The preparation and detailed planning that takes place during the Roadmap includes examining the documents which have been prepared as inputs to the ETL design process and ensures that they contain sufficient and accurate information. Also within this phase, the project plan for the specific increment is created that contains a detailed list of tasks to be accomplished during ETL design, estimates for those tasks and dependencies among the tasks. The steps for planning are described in the overall MIKE2 implementation guide.

Detailed Business Requirements for Increment (part of the Roadmap)

The purpose of this task is to validate, refine, categorize and prioritise business requirements for this particular increment. It involves reviewing the existing documentation from the Blueprint, and conducting additional interviews to define the purpose, goals/drivers, objectives, CSFs, KPIs and risks of the increment. These business requirements are then used to drive the requirements for the data that will be modelled.

Establishing the Environment (part of Foundation Activities)

The environment is established across Phases 2 and 3 of MIKE2.0 (specific revisions for the increment are made during Phase 3). The Technical Architecture will have specified hardware and software for Development, Testing, Production and these must be examined, corrected and/or enhanced as required. Have the environment ready means that there is an environment in place for database development and the team has the tools they require for modelling.

Enterprise Information Architecture

Based organisations do not have a well-defined Enterprise Information Architecture. MIKE2.0 takes the approach of building out the Enterprise Information Architecture over time for each new increment that is implemented as part of the overall programme. The scope for building the Enterprise Information Architecture is defined by the in-scope Key Data Elements (KDEs). The enterprise data model is the major part of this overall information architecture.

Solution Architecture (part of Foundation Activities)

The Solution Architecture is then created to outline the scope of the design and it typically includes a conceptual data model. This conceptual model is typically a more focused revision from the model established in Phase 1 and is a key part of tying together the overall solution approach that meets the business requirements.

Database Design (part of Foundation Activities)

Database Design involves the tasks for logical and physical design of the target database. The logical design employs the established data standards and conceptual model to and builds to the next level down, further defined attributes, domain values and keys. Physical design follows logical design; physical design will be more complex if resource requirements are more significant, such as Warehouses or for critical OLTP applications. For some projects the physical design is quite similar to logical design. These modelling processes should be conducted within a tool and ideally, lineage should be traceable between the different models through a metadata repository.

Prototyping of the Solution Architecture (part of Foundation Activities)

A prototype of the Solution Architecture is built to gain a better understanding of how the solution will work before moving into a more formalized design process. Prototyping the proposed solution should provide an end-to-end approach that includes each of the major components of the architecture. The prototyping effort is focused on what are perceived as the major technology risk areas for the increment.

The prototype is about testing high-risk or uncertain areas for the architecture. This means that business concepts in the data model may be extended to handle data quality issues such as missing parent record or failed validation against domains. The prototype will require a representative from the modelling team.

Design Increment (Phase 4)

There are some final infrastructure management activities that occur as part of implementing the data model. Database Administrators are responsible for:

  • Space allocation - data file and tablespace allocation
  • Data file placement and configuration
  • Partitioning
  • Index rebuilds

The team responsible for logical design is typically not closely involved in these tasks, which are more related to Infrastructure Development as opposed to Information Development,

Mapping to Supporting Assets

Logical Architecture, Design and Development Best Practices

This section will be built out over time. It references Supporting Assets that include technique papers, architecture models and design templates to support the data modelling process.

  • Data Model Implementation and Maintenance describes the processes/techniques required to deploy data models into target environments, and management of baseline data models and metadata through processes of iterative development, potentially with multiple concurrent data modellers.
  • Data Modelling Patterns reusable solutions for common data modelling problems eg Time Variance, Auditing, Volatility, Slowly Changing Dimensions, Hierarchies, Surrogate Key Translation.
  • Data Vault approach introduces Dan Linstedt's techniques for handing Volatility and Time Variance.

Product-Specific Implementation Techniques

This section is still being defined. Product-specific implementation techniques will apply to:

  • Map-and-gap techniques with certain tools
  • Usability documentation and extensions to certain tools
  • Open source data models
  • Capability reviews of off-the-shelf models

These will be built out progressively over time as Supporting Assets to the MIKE2 Methodology.

Product Selection Criteria

This section is still being defined. Product selection criteria for data modelling can include:

  • Data modelling tools
  • Database platforms (although these tend to be more infrastructure-focused or for a specific solution area such as Data Warehousing
  • Off-the-shelf models

These criteria can then be extended into the Vendor Assessment QuickScan that is part of the Open Source MIKE2 Methodology.

Relationships to other Solution Offerings

Extending the Open Methodology through Solution Offerings

Wiki Contributors
Collapse Expand Close

View more contributors