Open Framework, Information Management Strategy & Collaborative Governance | Data & Social Methodology - MIKE2.0 Methodology
Wiki Home
Collapse Expand Close

Members
Collapse Expand Close

To join, please contact us.

Improve MIKE 2.0
Collapse Expand Close
Need somewhere to start? How about the most wanted pages; or the pages we know need more work; or even the stub that somebody else has started, but hasn't been able to finish. Or create a ticket for any issues you have found.

Physical Data Model Deliverable Template

From MIKE2.0 Methodology

Share/Save/Bookmark
Jump to: navigation, search
This deliverable template is used to describe a sample of the MIKE2.0 Methodology (typically at a task level). More templates are now being added to MIKE2.0 as this has been a frequently requested aspect of the methodology. Contributors are strongly encouraged to assist in this effort.
Deliverable templates are illustrative as opposed to fully representative. Please help add examples to this template that are representative of the proposed output.

The purpose of the Physical Data Model task is to use the logical model design as a basis to design the physical database. This task involves modelling the increment’s database structure. Considerations should be volume/cardinality, partitioning, indexing, and backup/recovery. The degree of changes from the logical model to the physical model will vary largely based on the type system that is being built. The physical design may also differ from the logical model to improve performance or simplify query complexity.

Contents

Examples

Listed below are examples Physical Data Model:

Sample Physical Data Warehouse Data Model

This deliverable sample contains the model and overview text.

Introduction

This document outlines a number of the logical design decisions, recommendations, suggestions and considerations affecting the physical data design. Aspects covered include database design, data quality rules, load rules, control and use of versions.

Scope

This scope of this document covers the following aspects:

  • The sources of historical search, result and click through data that will populate the data warehouse
  • Data extraction rules
  • Data load rules

Features of the Physical Database Design

The sections below indicate current thinking about certain database design considerations for the ABC Data warehouse. These should not be taken as directions, since work done to date has been to provide a logical specification: physical design is to be completed as the first step of the build phase of the project. During physical design, other aspects may be determined which invalidate the considerations discussed below; other design issues will be determined and decisions required for those also.

Sub-Types

Where sub-types are defined in a logical data model, the database designer has options as to how the sub-types should be implemented in the database. In each case three options are possible:

  1. Implement each subtype as a table;
  2. Roll super-type attributes down in to the sub-types;
  3. Roll sub-type attributes up in to the super-type.

Consider the logical model shown below.

Logical Model

When implemented as database table, the three options described above would result in the table definitions shown below.

1. Implement each subtype as a table

Logical Model

2. Roll super-type attributes down in to the sub-types

Logical Model

3. Roll sub-type attributes up in to the super-type

Logical Model

Decisions as to which option is best are based primarily on commonality – of usage and of definition. Other considerations are relationship to other entities, particularly foreign keys in the sub-types as most DBMS, including Oracle, do not support optional foreign keys. Therefore if one sub-type has a foreign key and the other does not, option 3 can only be considered by including a dummy entry in the foreign key table.

Partitioning

The large volumes in the Data Warehouse, may require that physical table partitioning be considered.

Archiving and Purging

The Data Warehouse layer sizing has been carried out assuming that transactional (i.e. event) data is retained for a period of one month. Processes will need to be in place to a) monitor these volumes, and b) to remove old data at the appropriate time. The removal (purging) process should be able to be controlled so that the age of data to be removed can be easily varied. No data should be lost: it is recommended that event data is archived to some medium such as DVD, for retention up to a period of one year. The frequency of this purging and archiving activity should be designed such that it does not become so large that it causes disruption to system availability, nor it should it be so frequent as to be a continual overhead to regular update processing.

Data Mart processes will be required, as for the data warehouse layer, to monitor volumes and control data purging. The Data Mart late will retain all Fact (i.e. event) data for a period of one year. Since all fact tables but one have daily aggregations only, volumes are expected to become significant. A number of the scheduled reports will have monthly cycles, suggesting a monthly purge cycle, but processing times will need to be considered, as above. Similarly, purged Fact data should be archived to an alternate medium such as DVD and retained for a further year.

Data Load Rules

Insert New Record

The process for inserting as new record is as follows:

1. Determine non-existence of [[Natural Key Concept | natural key].

2. Insert new record:

  • Set Record Start Date to current date.
  • If available, set Effective Start Date as per extracted business data, else set to current date.
  • Set End Dates to NULL.
  • Set Version Number to 1.
  • Set Latest Version Indicator to "Y"
  • Where appropriate, set Currently Valid Indicator to "Y".

3. Follow Data Quality Rules for any parent entities.

Update Existing Record

1. Search for existence of natural key.

2. Find latest version.

3. Determine change effect:

  • Identify columns of interest from extracted data.
  • Compare identified columns with XYZ record to determine if any XYZ change.
  • Ignore extracted record if no XYZ impact.

4. Update current latest version:

  • Set Record End Date field to current date – 1.
  • Set Latest Version Indicator to "N".
  • Where appropriate, set Currently Valid Indicator to "N".
  • If available, set Effective End Date to extracted business data, else set to current date – 1.

5. Insert new version:

  • Set Record Start Date to current date.
  • If available, set Effective Start Date as per extracted business data, else set to current date.
  • Set End Dates to NULL.
  • Set Version Number to previous highest version number 1.
  • Set Latest Version Indicator to "Y"
  • Where appropriate, set Currently Valid Indicator to "Y".

6. Follow Data Quality Rules for any parent entities.

Source Record Deletion

1. Search for existence of natural key.

2. Find latest version.

3. Update current latest version:

  • Set Record End Date field to current date – 1.
  • Where appropriate, set Currently Valid Indicator to "N".
  • If available, set Effective End Date to current date – 1.
  • Latest Version Indicator remains set to "Y".
Data Mart Load

All Dimensions have been defined as Type 2 – Slowly Changing. All contain the following common attributes…

  • Version Number and latest Version Indicator;
  • Effective Start and End dates (defining the business applicability of the Dimension);
  • Record Start and End dates (defining the Data Mart validity of the Dimension records, which may be different from the Effective dates);
  • A Current Validity Indicator (indicating that the Dimension is still in current usage, not superseded or suspended).

The following general rules will apply to loading the data mart…

1. All Dimensions will be loaded prior to Facts.

2. Dimension records will not be updated: new versions will be created, using rules as defined above.

3. Fact records will not be updated: being transactional or snapshot in nature, each will exist once and once only.

4. Fact records when loaded will always link to the latest version of the appropriate Dimension (Latest Version Indicator = "Y").

5. Fact records will not be re-linked if new versions of Dimensions are created.

6. When bulk-loading historical fact records, the appropriate version of the Dimension will be determined by having the Fact effective date (for example Event Result date) within the Dimension date range specified by the Validity Start Date and Validity End Date.

Data Quality Rules

Listed below are the key rules associated with Data Quality issues:

Creating a Missing Parent Record

1. Create dummy parent record:

  • Set natural key to missing value.
  • Set Start Date fields to current date
  • Set End Date fields to NULL.
  • Complete whatever attributes are possible; set other attributes to NULL.
  • Set DQ Indicator to "D" for Dummy.

2. Create record in target table:

  • Set FK to PK of dummy parent.
  • Set DQ indicator to "R" for record-level error.
  • Set other attributes as normal.

3. Create record in Data Quality table:

  • Set Table Name to target table name.
  • Set Record key to PK of inserted target table record.
  • Set Column Name to target table FK column name.
  • Set Data Value to missing natural key of parent table.
  • Set Error Code to NULL (this column is not currently populated).
  • Set Error Text to NULL (this column is not currently populated).
  • Set Error Level to "H" indicator high severity error.
  • Set Source System to name of operational system.
  • Set Date Created to run date.

Retrieving Prior Versions of Records

Retrieve Latest Version

Retrieve record via natural key and Latest Version Indicator = "Y".

Retrieve Version at a Point in Time

Retrieve record via natural key and date between Start Date and End Date.

Retrieve Prior Version of a Parent Record

Retrieve parent via FK value from child.

Retrieve Latest Version of a Parent Record

1. Retrieve parent via FK from child.

2. Determine value of natural key.

3. Retrieve latest version by:

  • Joining parent table to itself on natural key.
  • Retrieve version where Latest Version Indicator = "Y".
Retrieve Prior Version of Child Record

Retrieve child via FK value in child.

Retrieve Latest Version of Child Record

1. Retrieve child via FK value in child.

2. Determine value of natural key.

3. Retrieve latest version by:

  • Joining child table to itself on natural key.
  • Retrieve version where Latest Version Indicator = "Y".
Retrieve All or Selected Versions of Child Record

1. Retrieve child via FK value in child.

2. Determine value of natural key.

3. Retrieve other versions by:

  • Joining child table to itself on natural key.
  • Retrieve versions according to required criteria.
Wiki Contributors
Collapse Expand Close