From MIKE2 Methodology
| This deliverable template is used to describe a sample of the MIKE2.0 Methodology (typically at a task level). More templates are now being added to MIKE2.0 as this has been a frequently requested aspect of the methodology. Contributors are strongly encouraged to assist in this effort.
|
| Deliverable templates are illustrative as opposed to fully representative. Please help add examples to this template that are representative of the proposed output.
|
The Perform Table Profiling task involves analysing data across rows of a single table to establish dependencies between attributes within each table. Each task is done on a per-system (or subset of a system) basis.
Key steps to table profiling include:
- Run table and primary key analysis on the selected entities using the profiling tool
- Review results with the Source System SME
- Perform further analysis on exceptions and anomalies, eg broken primary keys
- Update relevant sections of the Data Quality Assessment Report with findings
On completion, further analysis may be planned based on results
Example of Table Profiling
Table Profiling is used to determine the key information and also relationship and dependency patterns between the data in each field within the table. It is done across every field within each table in the source system. Table Analysis is important as there are some situations where it is absolutely essential to examine values in two or more columns. A blank value in Column A might be acceptable if there is a valid value in Column B, but unacceptable if Column B is also blank. Similarly, a valid value in Column A (e.g., product code "HL") and a valid value in Column B (e.g., sub-product code "521") might be an invalid code combination. In such situations you must look at value combinations across columns rather than just examine values within a single column. A tools-based approach is undoubtedly the best choice for examining these cross-column or "table based" dependencies.
Typical Investigations for Table Profiling
Table Profiling is most typically conducted to verify the validity/uniqueness, format accuracy and completeness. An example would be looking at Customer Records:
| Field
| Test
| Description
|
| Customer ID
| Completeness
| Is a unique value populated within this field to uniquely identify each customer record? If so is there a high population ratio on this field, ie: 100% populated.
|
|
| Uniqueness
| Presence of a Primary Key or Foreign Key in relational databases. Are these keys unique to each record?
|
|
| Validity
| Are these key fields populated with valid value, ie: unique values?
|
|
| Format
| Are these values in a consistent format, ie: numeric or does it contain leading alphas etc.
|
Steps in the Process
This section provides a process-driven approach for the executing Data Profiling. It should be noted that this process if often iterative and may start with a preliminary profiling of key systems before doing end-to-end profiling.
| Step 1 Table Profiling
|
| Objective:
| This task involves analysing data across rows of a single table to establish dependencies between attributes within each table. Each task is done on a per-system (or subset of a system) basis.
|
| Input:
| Completion of Column Profiling
|
| Process:
| 1. Run table and primary key analysis on the selected entities using the profiling tool. 2. Review results with the Source System SME 3. Perform further analysis on exceptions and anomalies, e.g. broken primary keys 4. Update relevant sections of the Data Quality Assessment doc with findings 5. Plan further analysis based on results
|
| Output:
| On completion of this step, the following has been identified:
- Dependencies between fields in each table
- Primary keys for each table
Deliverables include:
- Completion of the relevant sections of the Data Quality Report
|
Example Output Results