Personal tools

Partners

Guidelines for Using an Off-the-Shelf Data Model

From MIKE2 Methodology

Jump to: navigation, search

This section provides guidelines for using an off-the-shelf data model that is pre-defined by a third party. Insight on various approaches and considerations when leveraging a third party’s . The focus of the discussion aligns more towards the models use to support analytic or data warehousing based requirements.

This asset applies when using a pre-defined third party common data model as a reference model or as a basis to start modelling from as per leveraging select areas of the common data model is the focus of this paper.

Contents

Context within the Overall Methodology

These techniques can be applied when using an off-the-shelf model and are intended to provide insight specifically to data architects and data modellers. Additional insight may benefit those indirectly involved in the development and deployment of the model, such as project managers, power or business end users, source system analysts and ETL team leads.

Input

The following MIKE2 tasks provide input to the data modelling process:

Task 1.3.6 Define High-Level Information Requirements

At this point, the outputs of the prior 5 tasks are used to develop a list of High-Level Requirements that constitute the information needs and constraints that must be satisfied to achieve the business objectives. The information categories along with the information points form the start of the conceptual data model. Of critical importance is that the association between the key information categories and the business drivers and success measures be explicitly documented. The documentation within this task begins to frame the scope of the desired solution and what the overall information model will look like.

Activity 3.4 - Detailed Business Requirements

The purpose of this activity is to Validate, Refine, Categorize and Prioritise Business Requirements for this particular increment. It involves reviewing the existing documentation from Phases 1 and 2, and conducting additional interviews to define the purpose, goals/drivers, objectives, CSFs, KPIs and risks of the increment(s).

Some examples of inputs are:

  • Detailed Information requirements
  • A vendors analytical model which references their common model as its core source of input (i.e. IBM BST/AST models)
  • End users supplied examples to be used to validate the resulting data model

Output

The output of this process should provide direction when pursuing the logical data modelling activity, which can be used to prepare the physical data model.

Value of a Common Data Model

A vendor’s common logical data model (LDM) contains a wealth of intellectual property that would be difficult and time-consuming to create from scratch. Specifically, it provides:

Speed of Delivery

A common LDM can be a starting point to use either as a point of reference to compare against in-house models (i.e. identify gaps, etc) or to uses aspects of it as a basis to start development from. The common model encapsulates core groupings of information common within an industry and its content is cross functional and integrated. From this it can speed the identification of information requirements as per a common structure versus developing an LDM from scratch. This can also help to validate an organisation


’ s current model and provide a roadmap for extending those models into new areas.

Cost Reduction

A common LDM can be extended over time versus all at once, offering a more pragmatic and cost effective approach.

Risk Reduction

Common models represent significant insight and assets from a vendor


’ s past engagements. Best practice approaches have gone into the identification and validation of the models


’ elements and business rules. The helps reduce the risk of basing a system upon an inadequate data model.

Model Design Considerations - Overview

Conceptual Data Model for Financial Services
Conceptual Data Model for Financial Services

When leveraging a generic Logical Data Model (LDM) from a vendor (IBM BDW, NCR FSLDM, etc), the emphasis is less on traditional data modelling design activity (designing from scratch) and more on mapping and/or extending the common model. This involves understanding the LDM (entity/attribute definition, location in the LDM, etc.), mapping the requirements to the entity groupings (classification) in the LDM, confirming business rules and source/target definitions as mapped to the LDM, and identification of gaps that remain. From this analysis, additions to or extensions of the model as per current or future requirements will become apparent.

Understanding the Model

The selected data model must be studied to make sure that the underlying concepts are wholly understood. Without this understanding, the mapping to business requirements can be inadequate and inappropriate.

Requirements Identification

Variations to the general approach would normally occur depending on where the requirements came from, the level of detail of the requirements and how well they are defined, documented and understood.

Requirements can come from:

  • Analytic models (e.g., Basel II, IBM BST/AST) from a vendor, which map directly to their LDM
  • Client specific business requirements (profitability, etc.) yet to be mapped/verified against LDM
  • Source files mapped to LDM, which contain the atomic level data

Examples of requirement details that will need to be interpreted at their lowest level are:

  • Reports, including dimensions (column headings and filters), measures (formulas), aggregates (hierarchy groupings), etc.
  • Third party solutions (products) – Interpreting requirements that are considered out-of-the-box
  • Business query / questions – What are the elements of the query (i.e., noun and verbs) which when broken down identify the business rules and elements?
  • Performance Indicators – What are the measures and elements of the Key Performance Indicators (KPIs)?
  • Source files – Is the attribute at its lowest level or is it a derived attribute (i.e., a formula, lookup and/or complex algorithm was used)? Is it from the system of record (most accurate source) or is it from an interim database or file created which came after the system of record was created?
  • Attribute definitions when synonyms, acronyms or partial words are involved. This is time consuming and one of the most overlooked areas.

LDM Mapping Approach Options

To ensure the various business subject areas (ALM, risk, profitability, etc.) align to the classification groupings (Product, etc.) within the LDM (entities, attributes, etc.), a requirements-driven approach followed by a source file mapping activity is a pragmatic and better practice approach.

Requirements-Driven Approach (Top Down)

The requirements-driven approach involves business end users and the modeller mapping the requirements to the LDM. A validation exercise then tests if the LDM business rules and attributes support the end user requirements. This provides focus of what requirements can be mapped and or what needs to be added to or extended in the LDM to support the business requirements.

Lessons Learned include:

  • When a requirements-driven approach is pursued as a first step, the LDM contents and meanings can be assumed to be accurate and supportive of the business requirements. From this point on the LDM can be considered as the main source of the data requirements without a need to continually reconfirm the requirements with end users.
  • Pursuing the mapping of all user requirements at the onset must be tempered, as it may not be viable (i.e., no data will be available for a long time, too expensive or too time consuming to purse). In addition, it is common for a end user make generalized statements suggesting they want all items in the LDM based on the assumption it may be needed in the future, with out knowing if it is or not. This often causes significant scope creep and accuracy issues later as the expected results are hard to visualise and validate in the LDM when requirements are insufficiently identified.
  • An LDM is intended to reflect logical business requirements without taking into consideration physical constraints, such as data availability or complexity to resolve. If it is obvious that an issue cannot be resolved in an acceptable time frame (no data for a long time, business cannot support requirements, etc.), this fact should be noted, with limited time invested in representing the issue within the LDM at this point.

Source-Driven Approach (Bottom Up Down)

The source-driven approach involves mapping the source file attributes to the LDM based on understanding and gaining agreement between the source subject matter expert (SME) and the modeller. This requires interpretation of both the source and LDM attribute definitions and the value of the attributes from a business perspective. In addition, the business rules within and between the source attributes have to be investigated to ensure they are reflected in the LDM (e.g., a customer can have 0, 1 or many accounts).


Lessons Learned include: A purely source-driven approach is not recommended. There are significant pitfalls to this action.

  • Expediency Issues:

Several hundred-source tables and thousands of attributes may need to be reviewed to determine what maps to the LDM. When pursued in a condensed time frame involving multiple resources with different agendas (source SME, modeller, business analyst, etc.), different interpretations of the results can occur. In addition, when faced with a deadline, there is a high risk that contentious or suspect attributes may be included in the LDM when they should not be.

  • Political Implications:

If the business users have not agreed that the LDM accurately supports their requirements, they are in a position to challenge its value, accuracy and applicability.

  • Validation and Testing issues:

The level of accuracy and method of testing and verification will not be as accurate, because the design of the LDM is likely to be a close reflection of the source systems rather than the cross-functional, integrated perspective required to support the business requirements.

Testing subsequent to a source-driven approach tends to consist mainly of comparison of rows loaded into physical tables, with select control or hash totals being compared against source systems. This does not provide the necessary validation of the LDM, which involves examination of cross functional requirements, followed by a user oriented requirement acceptance testing when data has been loaded.

  • Suspect Attributes:

Introduction of attributes aligned to the use by an operational source application introduces complexity and should generally be avoided. These operational attributes rarely have significant analytic value that warrants their inclusion. A generic model which allows for a consolidate view of information to support business requirements is different in content and structure from an operational application LDM.

  • Quality vs. Quantity issues:

The source-driven approach can result in the LDM being filled with a large quantity of items, which are not very well organised, limited analytical value or not of current value (included as place holders for perceived future requirements). This makes it challenging to validate what truly reflects the business requirements. Also, a data model, which is based upon source system designs, is vulnerable to expensive changes whenever the source system changes.

Hybrid Approach

This is a combination of a requirements-driven approach followed by aspects of a source-driven approach.

In general, the inclusion of a source-driven approach is meant as a follow-on exercise to the requirements-driven approach. It is meant to increase the accuracy of the LDM by highlighting gaps early, and by including extra attributes from the involved source files which may not have been part of the identified requirements but which are felt to add value.

Lessons Learned include:

The hybrid approach helps establishes initial scope and obtains buy-in from end users. It also provides an increased understanding of the data gaps and accuracy of the LDM mapping.

As both business and IT are involved, less resistance should occur between both parties as to the outcome or direction going forward. This approach is often seen as a compromise when the type of mapping approach becomes contentious between involved parties. Each side gets something and has input into the process.

General Lessons Learned

Planning Considerations

  • When planning for the scale of the mapping requirements to the LDM, the common models normally contain a subset of the total numbers of entities and attributes required to address an organisation


’ s requirements. The majority of the entities (master/reference data) in the LDM contain a primary key field and 2 to 3 non-key attributes (type code, description, etc.).

The exception to this is if the vendor's analytical model is linked to its LDM. For example, IBM BST/AST references the IBM BDW. A more attributed model will exist to support these analytic requirements. Where metrics or derived attributes are involved, a narrative is normally provided describing the attributes’meanings, purposes and in some cases domain value examples. Rarely though are the detailed derivations of all attributes provided, as it is viewed that each organisation may handle these formulas differently.

  • Given the size and complexity of a vendor generic LDM, pursuing a complete mapping to the vendor


’ s LDM will be very costly in time and effort with an outcome that may not be worth the time invested. The assumption that end users will benefit from having all data extracted into a central repository has been proven not to work from a time, cost, data availability and complexity issue. Furthermore, the generic LDM contains entities and relationships, which, even if they could be implemented, may have a business value too dependent on sophisticated marketing techniques far from clients


’ current business needs.

Mapping and Design Considerations

  • The creation of a design principles document for the LDM provides a common basis of understanding as to how the LDM was designed, the approaches which were followed, and significant outstanding issues. If there are exceptions to normal modelling practices such as denormalisation or if issues are encountered, they can be documented. This document also serves as an education tool and reduces future debate and confrontation. Examples:
  • The model will capture and represent information only in one place with the LDM.
  • This information will be used to support "known" and ad-hoc requirements and allow extensibility for future initiatives (Basel II, etc.).
  • If an analytical vendor


’ s model is linked to its LDM and it aligns to a large percentage of the end users


’ requirements, the risk of mapping errors to the LDM is dramatically reduced since the link to the LDM offers implied accuracy and a form of initial validation.

  • Until business-related requirements are accurately mapped and validated against the LDM, it cannot be assumed that the source file attributes and rules map correctly to the LDM and are what the business users expect.

If incorrect assumptions are made and follow-on deliverables created (tables, programs, etc.), inaccuracies will not surface until data is populated and testing of live data occurs. Where it may have taken one hour to confirm and correct an error within an LDM, several days to weeks may be required to identify, resolve, roll back, redo, and re-test the post LDM deliverables. The LDM is the most forgiving area in which to resolve mistakes.

Denormalisation Considerations During LDM Design

Denormalising (collapsing or grouping) a sub-type entity such as a reference table into either a higher level parent table or with similar but not identical sub-type entities may be warranted under limited circumstances. The risk to the integrity of the model should be negligible in carefully managed cases. This is not being suggested as an all-or-nothing approach, but should be considered on a case-by-case basis.

Denormalisation may be appropriate if the majority of the below statements are true:

  • The subtype entities are highly used (queried).
  • The subtype attributes are stable (i.e., do not change over time).
  • There is low volume (i.e., some reference tables have less than a dozen rows).
  • There are few attributes (i.e., most reference tables have fewer than 4 attributes).
  • The sub-type and its attributes are well defined.
  • The use and purpose of the entity is fully understood.

Best practice prescribes a normalised LDM, with de-normalisation options addressed when creating the physical data model. However, the pressures and dynamics of a project may necessitate offering a compromise at an earlier stage (lose a battle, but win the war). This can occur if IT and/or the business users challenge the usability or applicability of the LDM. Convincing a person who comes from an application view of data versus a cross-functional background can be very difficult.

Be aware that whenever you denormalise, you have to strike a balance as you may be ruling out some business questions that may no longer be satisfied via a relatively straightforward query. Some level of detail is sacrificed when data is merged or pre-joined with other non-related data. This may require a more complex query to address a business question. Normally through, the LDM validation process discounts such concerns by having the end users test out their requirements on the model.

The denormalisation process must be marketed and managed properly. If this is not done, a general precedent is set, giving the users the impression they can bend the model at will rather than having the LDM modeller control the process.

Maintaining the Integrity of the Model

The following principles are important for maintaining the integrity of the off-the-shelf model:

  • A rigid set of standards and design principles is required and must be approved by interested and/or affected parties. These provide focus and conformity to the approach and mitigate the differences and debates that arise.
  • Authority for the data and modelling standards must be aligned and approved by a data management (governance) body that can and will react, should these standards be circumvented or challenged. There is a high probability that the data modelling team may be initially overridden as it may lack formal authority to enforce data standards. When a governance body is (eventually) in place this would not be the case
  • Use of a data modelling tool with relevant reporting and versioning considerations is required. A simple diagramming tool such as Visio cannot manage the complexity of an enterprise LDM as compared to a fit-for-purpose modelling tool such as Erwin.
  • If parallel streams of development activity by different modellers will occur against the same subject area within the LDM, strict adherence to model publishing standards, version control and impact analysis with other downstream deliverables (tables, ETL processes, etc.) must be enforced.
  • A centralised data architecture team is required and it must take responsibility for ensuring the integrity of the model.

Depending on the model size, the number of parallel development streams involved and the downstream deliverables impacted, a lead time of several days may be required between each update and roll out of the master copy for others to use (for example, making changes to existing tables which are referenced by the ETL deliverables).

Gaining Understanding and Acceptance of the LDM

Resistance to understanding a generic model, its content and its use generally occurs more from a technical perspective than from a business perspective. The model can be seen as large and unwieldy though the message may come across denoting something different (complicated, confusing, etc.). Such views propagate negativism, which over time can be hard to combat unless one is proactive.

By focusing on a subset of the model at any one time, the model


’ s interpretation becomes quite simplified. The area under discussion is no longer a needle in a haystack to be found within hundreds of entities. Instead it is in a select classification of the model (e.g., product), consisting of 10


+ core entities (accounts), with 30


+ sub-type reference entities (account codes, dates, amounts, etc.).

The knowledge required to align to and/or evolve an industry model is less concerned with technical implications of designing a model and more concerned with:

  • Knowing where things are, combined with industry knowledge
  • "Concepts" on why and how to model v. explicit modelling techniques such as normalisation
  • Understanding of data management principles (standards, data quality, etc.)
  • Ability to act as a facilitator and have capabilities to influence others
  • Capabilities and interest to think from a logical perspective – keeping the LDM simple and logical, excluding technical hardware and software concepts, parameters and product syntax that may change with a new release or product revamp

The initial, core users of the model are those who develop or extend it, such as the data modeller and the business users involved in the LDM discussions. Those who fall outside this category (technical architect, ETL programmers, source system SME, end users, etc.) need not be concerned about its details. Most need only a high-level view or insight to specific areas of the model. Until such resources are more familiar with the areas they are involved with, the data modeller can help direct them to the details.

As a result, competency needs to be built up over time within a few resources not many. Later, a secondary layer of resources may need to gain additional knowledge of the model to act as the liaison between Business and IT.

After the model is developed, and follow-on deliverables are produced (tables, programs, etc.) there is a limited need for technical resources to know much about the enterprise model. This is due to the LDM capturing the business perspective of how information is related rather than the technical perspective.

Introductory education in the form of a one- to two-hour presentation is recommended to display and discuss general examples of why and how a model is built and where things may be captured. Such education is normally a prelude to pursing requirements gathering and mapping sessions for those involved in a new engagement. The benefit from this education will be that the same message, language and rational is shared to all and can be pointed back to when debates occur or there is uncertainty.

Providing a high level mapping of relevant business areas (Target or Source) and or application systems (Source) as aligned to the vendors LDM classification and or subject areas will present a view of common areas and infer where reusability benefits can be obtained. This can be in the form of a matrix, or similar. An example is presented in the below figure.

Powered by omCollab