Personal tools

Partners

Perform Table Profiling Deliverable Template

From MIKE2 Methodology

Jump to: navigation, search
This deliverable template is used to describe a sample of the MIKE2.0 Methodology (typically at a task level). More templates are now being added to MIKE2.0 as this has been a frequently requested aspect of the methodology. Contributors are strongly encouraged to assist in this effort.
Deliverable templates are illustrative as opposed to fully representative. Please help add examples to this template that are representative of the proposed output.

The Perform Table Profiling task involves analysing data across rows of a single table to establish dependencies between attributes within each table. Each task is done on a per-system (or subset of a system) basis.

Key steps to table profiling include:

  • Run table and primary key analysis on the selected entities using the profiling tool
  • Review results with the Source System SME
  • Perform further analysis on exceptions and anomalies, eg broken primary keys
  • Update relevant sections of the Data Quality Assessment Report with findings

On completion, further analysis may be planned based on results

Contents

Example of Table Profiling

Table Profiling is used to determine the key information and also relationship and dependency patterns between the data in each field within the table. It is done across every field within each table in the source system. Table Analysis is important as there are some situations where it is absolutely essential to examine values in two or more columns. A blank value in Column A might be acceptable if there is a valid value in Column B, but unacceptable if Column B is also blank. Similarly, a valid value in Column A (e.g., product code "HL") and a valid value in Column B (e.g., sub-product code "521") might be an invalid code combination. In such situations you must look at value combinations across columns rather than just examine values within a single column. A tools-based approach is undoubtedly the best choice for examining these cross-column or "table based" dependencies.

Typical Investigations for Table Profiling

Table Profiling is most typically conducted to verify the validity/uniqueness, format accuracy and completeness. An example would be looking at Customer Records:

Field Test Description
Customer ID Completeness Is a unique value populated within this field to uniquely identify each customer record? If so is there a high population ratio on this field, ie: 100% populated.
  Uniqueness Presence of a Primary Key or Foreign Key in relational databases. Are these keys unique to each record?
  Validity Are these key fields populated with valid value, ie: unique values?
  Format Are these values in a consistent format, ie: numeric or does it contain leading alphas etc.

Steps in the Process

This section provides a process-driven approach for the executing Data Profiling. It should be noted that this process if often iterative and may start with a preliminary profiling of key systems before doing end-to-end profiling.

Step 1 Table Profiling
Objective: This task involves analysing data across rows of a single table to establish dependencies between attributes within each table. Each task is done on a per-system (or subset of a system) basis.
Input: Completion of Column Profiling
Process: 1. Run table and primary key analysis on the selected entities using the profiling tool.
2. Review results with the Source System SME
3. Perform further analysis on exceptions and anomalies, e.g. broken primary keys
4. Update relevant sections of the Data Quality Assessment doc with findings
5. Plan further analysis based on results
Output: On completion of this step, the following has been identified:
  • Dependencies between fields in each table
  • Primary keys for each table
    Deliverables include:
  • Completion of the relevant sections of the Data Quality Report

Example Output Results

Powered by omCollab