Open Framework, Information Management Strategy & Collaborative Governance | Data & Social Methodology - MIKE2.0 Methodology
Wiki Home
Collapse Expand Close

Members
Collapse Expand Close

To join, please contact us.

Improve MIKE 2.0
Collapse Expand Close
Need somewhere to start? How about the most wanted pages; or the pages we know need more work; or even the stub that somebody else has started, but hasn't been able to finish. Or create a ticket for any issues you have found.

Data Modelling Concepts

From MIKE2.0 Methodology

Share/Save/Bookmark
Jump to: navigation, search

Data Modelling Concepts described the different steps in the data modelling process for MIKE2.0 as well as a comparison of different techniques that are used.

Contents

= Overview

MIKE2.0 recommends a 4-step approach to modelling:

  1. Conceptual Modelling
  2. Logical Modelling
  3. Physical Modelling
  4. Physical Implementation

These techniques are followed across the different approaches used for building information platform models.

Conceptual Modelling

Conceptual data models record the broad objects / things (sometimes called Subject Areas) that the business interacts with and names the relationships between these. Technically these are known as Entity and Relationships and are recorded on an Entity Relationship (ER) Diagram.

The purpose of the conceptual data model is to discover the Key Data Elements and to name them in an agreed way. Gross level business rules are discovered, challenged and verified. The Entities or Key Data Elements must be named in client / business terms for example it is important to agree that ‘Client’ is to be used rather than ‘Customer’. Abstract terms, such as ‘Party’ must be avoided. Many ER conventions are relaxed for the Conceptual Model for example many-to-many relationships are encouraged.

Conceptual Models require input and validation from business stakeholders. Common sense is often a good guide to get started. For example, just as a consumer, one can predict that ‘Customer’ and ‘Policy’ are likely to be two entities for an insurance company. One can further suggest that the relationship will be “Customer owns many Policies” and “Policy is owned by many Customers”. At this level the business can verify the truth or otherwise of these statements.

Logical Modelling

The Logical Data Model (LDM) is a more formal representation of the conceptual. Relational theory is used to normalise the data. Like objects may be grouped into super and sub types. Many-to-many relationships are resolved and so on. Additional concepts may be introduced to the business as a result of this work. (eg ask the Business ‘Do we care that employees can also be Customers?’) Similarly concepts may be dropped, usually these are self evident truths that no IT system is interested in enforcing. For example a rule such as ‘A Company must be registered before it can trade’.

Greater complexity is usually added as decisions about any need for history are made along with decisions about logically unique keys.

The Logical model can, and should be, ‘proven’ by playing business transactions against it. If Customers can change their Address then the model must show a connection between ‘Customer’ and ‘Address’.

Physical Modelling

The Physical Data Model (PDM) should be the representation of the underlying database implementation. As a model the aim is still communication – many tool vendors concentrate on being a tool for DDL generation at the expense of this communication. In many cases the physical model flows naturally from the logical. Each entity is implemented as a Table, each relationship as a constraint etc. Often referred to as a Third Normal Form (3NF) implementation.

3NF Implementations are extremely common for enterprise ODS as they yield consistent insertion rules, the database itself enforces most of these regardless of the different sources being loaded.

If however a generalised database design or packaged implementation is to be used then the physical model may bear little or no resemblance to the logical. In this case an additional layer of documentation is required that shows how and where each logical entity and relationship is to be implemented. This is critical if the business perspective is to be carried through to the physical design.

The Physical model can, and should be, ‘proven’ by mapping the logical entities and relationships to the physical. Special physical / performance requirements can also be proven but must remain a secondary objective.

In addition if a generalised design is used (or referential integrity is turned off in the database) then consistence of insertion etc can only be guaranteed by providing a common code interface to the data; either a message or ‘CRUD’ layer. forfait b and you rio bouygues portabilité du numéro calcul IMC rio orange

Comparison of Modelling Techniques

Comparing Logical and Physical Data Modelling

Example of differences between the LDM and PDM are presented below. When the LDM is used as a basis for capturing Data Warehouse requirements, the select items may differ or be added to.

Logical Data Model Physical Data Model
Contains entities, relationships, attributes, cardinality, domain types and sample values, Contains tables, columns, keys, data types, validation rules (via DB syntax), DB stored procedures and triggers, domain values (all) and access/security considerations.
Business like names used as influenced by end users. Names may be more technical as influenced by Data Base (i.e. length of column, instance, owner) or IT resource.
Unique identifiers defined Primary and Foreign keys, indexes used
Should be normalized to 3rd Normal form Useability and performance implication may influence different schema designs (De-normailze)
Should not include redundant data Redundant data (elements) may be allowed.
Should not include derived data Derived data (Aggregates, measures) may be allowed.
Subject Mater Experts or Business users should drive/influence the model DBA, DB platform and type of access against PDM elements will influence the design of the PDM schema.
Validation of the model involves a higher level of business end user involvement to include requirement examples from a logical perspective (i.e. no data involved) as applied against the LDM content. Greater influence and involvement of the DBA and IT (ETL) resources are involved to validate the PDM content and alignment to requirements. Depending on the PDM schema design (top down or bottom up design), the type of unit and user acceptance testing may vary in content, complexity and accuracy.
Involves more discussion on business and data requirements and relationships. Involves more discussion on process and activity than the "detailed" data and relationship requirements needed to support the requirements (i.e. Rpts and file layouts do not highlight relationships between elements/tables, cardinality, interim elements etc.).
Changes to the LDM content are less involved and complex to resolve and costly if inaccurate requirements or provided are missing and need to be added. Changes to the PDM and subsequent deliverables which reference it can be can highly involved and complex to resolve and quite costly, if inaccurate requirements are provided or missing and needs to be added.

If a vendor


’ s common model (e.g. IBM FDW/BDW, NCR FSLDM, etc) is leveraged, the approach to designing the LDM may align more towards using it as a reference model to check against or as basis to start modelling from. The use of a common model can minimize the level of effort and time involved to produce a similar model if pursued without any such input.

There will not always be a conceptual model as input to the logical modelling process as not all projects define a conceptual model.

Comparing Relational and Dimensional Modelling

The modelling approach will be different depending on whether relational or dimensional modelling techniques are being employed.

Examples of differences between the relational and dimensional modelling use and structure are presented below. It is assumed that detailed requirements have been identified regardless of the modelling difference.

Relational Schema Dimensional Schema
There are greater variations in the how one can access data. There are fewer variations on how to access data as the Fact table is normally the centre of the query.
Involves multiple selection criteria to be used to obtain data. Involve fewer selection criteria (Dimensions


+ Fact)

Less aligned to how the end user views the data. More aligned to how the end user views the data.
Greater flexibility is available to answer more varied and complicated business questions. Less flexibility as the structures and data content are normally more focused due to pre-defined requirements.
Modelled closer to how the source systems are structured as compared to a dimensional model Aside for master (reference) data, the model is more unique as compared to the source systems structure the data came from.
Source Personnel (Ops/System/Appl) may have a better understanding of table


’ s layout and may write more queries against it initially.

As structure and table names are more business like/related, greater end users knowledge and query capabilities may exist.
ETL is less complicated to design to populate the schema. ETL can be more complicated to populate. E.g. order of population, hierarchies, measures (Facts), etc.
Content of table (non-key attributes) driven by relationship to key Content of table driven more by business requirements, performance and usability.
Less space is normally required due to reduced redundancy More redundancy (around dimensions) allowed to reduce query / response times
Data is separated into more entities Data is placed in fewer tables

Relational (normalized) Modelling Approach

Examples of activities should include the following:

  • Identify potential entities
  • Define attributes of entities (common groupings related to the entity)
  • Identify primary keys that makes a entity unique
  • Remove attributes that have repeating groups (1st NF)
  • Remove attributes dependent on only part of the key (redundant data) (2'nd 'NF)
  • Remove attributes dependent on attributes that are not part of the key. What remains should be dependent on the key the whole key and nothing but the key (3rd'd 'NF).
  • Define each distinct and identifiable relationship (draw lines between related entities unique identifiers).
  • Name each distinct and identifiable relationship on the LDM relationship line (i.e. named relationships).
  • Identify relationship cardinality types between entities (1:1, 1:1 or 0, 1:M, 1:M or 0,..)
  • Identify exclusivity (Mandatory / Optional) between related entities (draw straight or doted lines between entities)

Dimensional (de-normalized) Modelling Approach

Examples of tasks should include the following:

  • Identify Dimensions
  • Identify Measures
  • Design Dimensional Hierarchies
  • Define summarization levels
  • Design Fact Tables, including Attributes and Measures (Star Schema design)
  • Design Dimension Tables (Star Schema design)
  • Design Fact (Summary) Tables (Star Schema design)
  • Define Metadata Layer

Data Warehouse Modelling Considerations

If the model is used to support Data Warehouse versus operational requirements, additional considerations may need to be applied. While some of the activities listed below deviate from a pure modelling approach, they are appropriate as per the dynamics of a Data Warehouse. These considerations may apply to either relational or dimensional modelling based on requirements, schema use, where and why. Examples of these are:

  • Add aspect of time: May occur through varying methods, such as incorporation of time into the key of entities facilitates capturing history.
  • Add derived data: Predefined, standard derived data provides consistency across the enterprise for key business data elements such as net sales amount and profit amount.
  • Adjust granularity level: The level of granularity supports drill up/down capabilities.
  • Add summarized data: The standard summarized data facilitates delivery of data without necessitating redundant calculations.
  • Merge tables: With the strategic focus, certain tables are merged to reduce the joins required to deliver or access the data in the data warehouse.
  • Incorporate arrays: When appropriate, arrays can further facilitate delivery of data.
  • Segregate data based on usage and stability: Data is segregated based on its stability (e.g., how often it changes) and on the usage commonality. This step helps both in the data acquisition and in the data delivery. In addition, the data model supports creating conforming dimensions to further simplify the data delivery process for OLAP data marts.

Lessoned Learned

The majority of activity related to Data Warehouse modeling is directly applicable to a dimensional model, as the use and method of access helps influence the design of the initial model schema. If similar DW considerations are applied to a relational model, there needs to be an appropriate reason for doing it ("known" performance issues, usability, etc). As the relational model scheme offers the most flexibility in a third normal form schema, de-normalizing of it may have a negative influence on its flexibility.

Wiki Contributors
Collapse Expand Close

View more contributors