Open Framework, Information Management Strategy & Collaborative Governance | Data & Social Methodology - MIKE2.0 Methodology
Wiki Home
Collapse Expand Close

Members
Collapse Expand Close

To join, please contact us.

Improve MIKE 2.0
Collapse Expand Close
Need somewhere to start? How about the most wanted pages; or the pages we know need more work; or even the stub that somebody else has started, but hasn't been able to finish. Or create a ticket for any issues you have found.

Data Modelling Standards and Guidelines

From MIKE2.0 Methodology

Share/Save/Bookmark
Jump to: navigation, search

Contents

Conceptual Modelling Standards And Guidelines

The purpose of Conceptual Data Model is to communicate the Conceptual Data Architecture to a business or non-technical audience. To ensure the Conceptual Data Model is easily understood, the Conceptual Data Model should be represented pictorially or as an Entity Relation Diagram (entity level only with keys and attributes ignored).


The following Conceptual Modelling standards apply:

  • Each Entity (Box/Picture) must be labelled with a singular Noun.
  • Each Relationship (Line) must be labelled with a singular Verb.
  • One to Many and Many to Many relationships are acceptable.
  • Cardinality is optional.
  • Not every entity needs to be included, just the key or backbone entities. Whole subject areas may be represented by a single 'box'
  • Conceptual data model(s) may be created using graphical tools or using Data Modelling tools.

Logical Modelling Standards And Guidelines

The purpose of the logical model is to enable communication with clients – once agreed the logical model is used as the basis of the physical model.

The following section describes the naming standards and diagramming conventions used for presenting logical data models.

Entities

An entity is something of significance about which the business wishes to store information, such as CUSTOMER or STOCK.

The naming conventions for an entity include using a singular noun that is as meaningful as possible to reflect the information it is maintaining. Additionally, the suffix of type should be added to the entity name if the entity represents a classification of information such as a PARTY_ROLE_TYPE rather than a specific instance of a real thing such as a PARTY_ROLE.

Attributes

An attribute holds a particular piece of information about an entity, such as CUSTOMER NAME or BIRTH DATE.

Attributes may be either part of the unique identifier of an entity (also referred to as the primary key), mandatory, or optional.

Each attribute must end in a class suffix that describes the type of data the item holds. For example STOCK ITEM QUANTITY or SALE PRICE AMOUNT.

Each attribute must inherit its format from a data domain.

Conventions used in the naming of attributes mean that certain suffixes included in the attribute’s name imply certain meanings to the names. All attributes must be defined using a class word suffix as specified in Appendix A – List of suffix codes for attributes.

Relationships & Cardinality

Relationships define how two entities are associated with one another. When relationships are used in the text, they are usually shown in lower case as a normal part of the text.

The relationship must be identified with a verb or verb phrase.

Relationships may be one-to-one, one-to-many, many-to-one or many-to-many.

Relationships may be either optional or mandatory. A dotted relationship means that the relationship is optional and a continuous line means that that the relationship is mandatory (the relationship is present in all occurrences of each entity.

Domains

Domains provide a classifying scheme to ensure like data attributes are sized the same. It is a useful technique for data model maintenance if, for example, all descriptions have to be changed from 60 to 80 bytes.

This concept is usually restricted to the Data Models and data management portion of a project.

Attributes must be defined a meaningful domain.

A starter set of domains is shown below. This should be augmented for specific projects by adding appropriate new domains and / or altering the sizes of the sample set.

Domain Name Physical Domain Name Datatype Definition
Date Date_Dte DATE Generic Date(time) domain. Suffix is _DTE
Day Day_Dy DECIMAL(3) Contains the day portion of a partial date, a number between 1 and 31. Suffix is _DY.
Month Month_Mth DECIMAL(2) Contains the month portion of a partial date as a number between 1 and 12. Suffix is _MTH
Time Time_Tme TIME Contains time only in 24 hr format, i.e. HH:MM:SS
Timestamp Timestamp_Tsp TIMESTAMP Timestamp value to whatever level the DBMS supports. Suffix is _TSP.
Week Week_Wk DECIMAL(2) Contains the week number within a year. Suffix is _WK.
Year Year_Yr DECIMAL(4) Contains the year portion of a partial date, a 4 digit number, including the century, must be provided. Suffix is _YR.
ID Id_Id DECIMAL(10,0) Internal Numeric Identifiers
Number Number_Num DECIMAL(10,0) Numeric IDs not arbitrarily allocated e.g. PART_NUM. Suffix is _NUM
Sequence Sequence_Seq DECIMAL(4,0) A defined sequence number which when conmbined with an ID forms a composite key. Suffix is _SEQ.
Code Code_Cde VARCHAR(20) For all coded values ie text codes. Class suffix is _CDE
Txt_ID Txt_Id VARCHAR(18) An identifier that needs to be a VARCHAR2 instead of NUMBER. Use this mostly for alpha keys from other systems. Within our own system types of CoDE or NaME are more appropriate. Use the field Suffix _TXT (not ID)
Comment Comment_Txt CHAR(18) Long Free Text type
Description Description_Desc VARCHAR(200) Used for Description in Code Description sense
Indicator Indicator_Ind CHAR(1) Used for True / False fields. Note there is no third (maybe) value. Attributes of this type should be mandatory. Suffix is _IND.
Indicator Aug Indicator_Aug CHAR(1) Will contain either a Y or N to identify whether the record entry was inserted by the owning process or a foreign process.
Name Name_Nme VARCHAR(100) Contains a proper name eg. Place, Person, Company. Suffix is _NME
Telephone Telephone_Phone VARCHAR(40) Comtains a telephone number, up to 40 characters. SUFFIX is _PHONE
Text Large Text_Large_Txt VARCHAR(160) Large text (VC(240)). Std suffix is _TXT
Text Medium Text_Med_Txt VARCHAR(80) Standard Text (80) single line things. Std Suffix is _TXT
Text Small Text_Small_Txt VARCHAR(40) Small text (VC(20)). Std suffix is _TXT
Amount Amount_Amt DECIMAL(18,5) Monetary amounts (Dollars and cents) - up to 5 decimal places. Standard suffix is _AMT.
Percent Percent_Pct DECIMAL(4,4) Percentage amount, expressed as a number and up to 2 decimal places.

Suffix is _PCT.

Quantity Quantity_Qty DECIMAL(18,6) Countable or measurable quantities (not money amounts), accepts up to 6 decimal places.

Class suffix is _QTY

Sub-types And Super-types

A sub-type, sometimes referred to as a sub-entity, is a classification of an entity which has characteristics such as attributes or relationships in common with the more general entity. Sub-types are represented in entity relation diagrams by entities inside other entities (Oracle Convention), or as connected entities (ER-Win convention). The common attributes and relationships between sub-types are shown in the outside entity which is known as the super-type. The attributes and relationships of the super-type are therefore inherited by the sub-type. The figure below shows the super-type TRUSTEE with a sub-type of APPROVED_TRUSTEE

< Insert Sub-type And Super-type diagrams >

Physical Modelling Standards And Guidelines

This section describes best practice standards and guidelines for physical data modelling, including common naming conventions, common use of data types, common formats for text fields, data presentation standards, use of common domains and class words.

Tables

Each entity in the logical model should map to one or more tables in the physical model. The degree of normalisation or denormalisation should be based on factors such as: - data loading patterns - query access paths & data usage requirements - database performance - storage & capacity requirements

Columns

Each attribute in the logical model will map to a column in the physical model.

Relationships

Each relationship in the logical model will map to a column in the physical model (the foreign key).

Primary Key

The use of a primary key is mandatory for all tables except in temporary or other run-time tables. The primary key must be non-volatile.

Use natural keys that have a meaning to the user if these exist and are guaranteed to be unique.

Surrogate Key

Surrogate keys should be used where a natural key is not appropriate or use of a composite key complicates use. Natural keys are not source system keys: they are pervasive and common codes found, in fact, quite rarely in organisations. Examples would include 'state code' = 'VIC', 'TAS', 'WA' and so on.

A Source system key could be a natural key for the organisation but this quite rare and will not occur when there are two or more data sources contributing to an entity. To be considered as a natural key the source system key must be unique and should not itself be a composite key.

If there is any doubt, assign a surrogate key.

Surrogate keys must be Numeric Identifiers and must not provide any contextual meaning. An effective random allocation can be achieved by having one data base sequence furnish identifiers for several tables. Allocating this way discourages personnel from assigning meaning where none should exist. Both of these examples highlight legitimate data requirements but these should not be solved by misuse of the identifier

Surrogate keys facilitate more efficient joins within the warehouse database. Some source systems use varchar values as primary keys. These keys can be quite lengthy, and become very inefficient during join processing. By replacing these varchar keys with integer surrogate keys, a significant increase in join performance can occur.

Default And Missing Values

Missing Parents from Parent-Child Relationships

If source system cannot supply values that are needed for primary keys then, in general, the entries should not be loaded to the database. This is a design issue where the structure of the database is in question. If the source system can supply values for primary key of the parent (but not the detail) then augment processing must be initiated to populate the missing parent value and other attribute values must be taken from the defaults listed below are to be used.

Missing parent keys can occur in a star schema where one source does not contain a value for a certain dimension. In this situation the defaults listed below are to be used unless another value is universally more appropriate. For example it may be appropriate to assume country = Australia or currency = AUD for many source systems.

Missing Mandatory Foreign Key Values

If a source system cannot supply values that are needed for mandatory foreign keys then the following defaults listed below are to be used and 'augment' processing must be initiated to populate the missing parent value.

For certain cases we wish to avoid empty fields where these could be used by end user tools to show values eg lists etc. Typically these fields are description fields, but they don’t have to be. The decision of which fields are applicable will be made on a case by case basis by the Design Authority. For these fields a default must be provided from the set provided in paragraph 38, except for text which should be more appropriate if possible, e.g for text you might want to use “(Unknown)” or “Unknown description for code=xxxx”.

Wiki Contributors
Collapse Expand Close

View more contributors