Open Framework, Information Management Strategy & Collaborative Governance | Data & Social Methodology - MIKE2.0 Methodology
Wiki Home
Collapse Expand Close

Members
Collapse Expand Close

To join, please contact us.

Improve MIKE 2.0
Collapse Expand Close
Need somewhere to start? How about the most wanted pages; or the pages we know need more work; or even the stub that somebody else has started, but hasn't been able to finish. Or create a ticket for any issues you have found.

Data Profiling

From MIKE2.0 Methodology

Share/Save/Bookmark
Jump to: navigation, search
Activities in Phase 3
Phase 3 - Information Management Roadmap and Foundation Activities
Content Model Relationship

Contents

Activity: Data Profiling

Objective. The Data Profiling activity focuses on conducting an assessment of actual data and data structures. It helps provide the following:

  • Identifies data quality issues - measurements are taken against a number of dimensions, to help identify issues at the individual attribute level, at the table-level and between tables.
  • Captures metadata as a by-product of the process – Metadata is key to the success of subsequent phases of the data load including the ongoing use and maintenance of the data. Useful metadata is identified as part of the data profiling exercise and the tool-based approach will ensure it is captured in a reusable format.
  • Identifies business rules – The next step is to perform the data mapping. Data profiling will assist in gaining an understanding of the data held in the system and in identifying business rules for handling the data. This will feed into the future data mapping exercise.
  • Assesses the ’fitness for purpose’ of the source system data to satisfy the business requirements. This assessment is often done in the context of a Data Warehouse, CRM, or ERP system initiative. Therefore, the focus is on gaining a very detailed understanding of the source data that will feed these target systems, to ensure that the quality level is sufficient to meet the requirements of the target system.

The purpose of this phase is to provide objective, information-based results around information analysis.

Major Deliverables
  • Data Quality Assessment Report (per Source System)
  • Data Quality Metrics updated to Metadata Repository
  • Mapping Rules and Business Rules updated to Metadata Repository


Common assets:

Tasks

Task: Prepare for Assessment

Objective: In this task, it ensured that the profiling environment is ready and the scope of information to be investigated is agreed-upon within the team and signed off by the client. This may be challenging as at this stage some of the other requirements may still be somewhat undefined. Minimising any gaps in time in getting the necessary extract files is a critical area dependency and risk area during this task. As profiling requires production extracts, the timelines for procurement of data may be significant.


Input:

  • Detailed Business Requirements for Increment
  • Upgraded Development Environment
  • Configuration Management Baseline
  • Metric Categories and Measurement Techniques


Output:

Task: Perform Column Profiling

Objective: This task involves profiling the data found in a single column/field in either a table or a flat file. It involves analysis of simple and complex fields. Each task is done on a per-system (or subset of a system) basis.

Key steps to column profiling include:

  • Understand all the fields and document their descriptions in the profiling tool
  • Now that fields are understood, analyse the fields that have been identified as being of interest during the interviewing-based assessment
  • Update relevant sections of the Data Quality Assessment Report with findings
  • Determine additional tables which may be required to provide the desired data

Simple and Complex field profiling may be split into separate tasks.


Input:

  • Information Requirements for column-level data analysis
  • Relevant data extracts


Output:

Task: Perform Table Profiling

Objective: This task involves analysing data across rows of a single table to establish dependencies between attributes within each table. Each task is done on a per-system (or subset of a system) basis.

Key steps to table profiling include:

  • Run table and primary key analysis on the selected entities using the profiling tool
  • Review results with the Source System SME
  • Perform further analysis on exceptions and anomalies, eg broken primary keys
  • Update relevant sections of the Data Quality Assessment Report with findings

On completion, further analysis may be planned based on results.


Input:

  • Completion of Column Profiling
  • Information Requirements for table-level data analysis
  • Relevant data extracts


Output:

Task: Perform Multi-Table Profiling

Objective: This task involves analysing data across tables to look for redundancy and referential integrity issues. Tasks may done on a per-systems basis, sub-system basis or across systems. Key steps to table profiling include:

  • Using the profiling tool, perform relationship analysis on the selected tables to confirm uniqueness, referential integrity and to identify redundant data across tables
  • Update relevant sections of the Data Quality Assessment Report with findings
  • Discuss results with Source System SME
  • Plan further analysis based on results


Input:

  • Completion of Table Profiling
  • Information Requirements for multi-table level data analysis
  • Relevant data extracts


Output:

Task: Finalise Data Quality Report

Objective: This step will complete and issue for signoff the Data Quality Assessment Report. Sections of the Data Quality Assessment Report and metadata repository should be populated throughout the End-to-End profiling exercise. This step is to complete remaining sections and to make a final recommendation on whether this data should be loaded into the target system. On completion, there should be a formal walkthrough, review and final signoff.


Input:

  • Completion of Column Profiling
  • Completion of Table Profiling
  • Completion of Multi-Table Profiling


Output:

  • Completion of the Data Quality Assessment Report Deliverable Template

Role:Information Architect

Role:Business Analyst

Role:Technical Analyst

Role:Data Quality Manager

Role:Data Owners

Role:Data Stewards

Yellow Flags

  • Profiling team not provided with representative production data
  • Profiling result-set doesn't directly flow into the following activities:
    • ETL Integration
    • Metadata related to data quality
    • Data Re-Engineering approach
Wiki Contributors
Collapse Expand Close

View more contributors