Perform Multi-Table Profiling Deliverable Template
From MIKE2.0 Methodology
-> You are here: Perform Multi-Table Profiling Deliverable Template
The Perform Multi-Table Profiling task involves analysing data across tables to look for redundancy and referential integrity issues. Tasks may done on a per-systems basis, sub-system basis or across systems. Key steps to table profiling include:
Using the profiling tool, perform relationship analysis on the selected tables to confirm uniqueness, referential integrity and to identify redundant data across tables
Tools-Based Assessment (Data Profiling) Execution
Whereas the IM QuickScan-based assessment is used to uncover the insight and perceptions of key staff that are closest to and most affected by information quality issues, the tools-based assessment phase focuses on conducting an assessment of actual the data and data structures. The purpose of this phase is to complement the survey-based assessment feedback with objective, information-based results. It also provides a quality check against the results of the interviewing process.
This section is meant to provide guidelines as to the tests that are commonly used and the approach taken to measure data quality. Each project should consider these to be a starting point only and should ensure that the specific tests that will uncover greatest value for a client are uncovered as part of the discovery process for this client.
Data sources are profiled in multiple dimensions: down columns (column profiling); across rows (Table Profiling); and across tables (Multi-Table Profiling). The output from Data Profiling can then be used in a number of fashions, including data cleansing and mapping to a target environment.
Multi-Table Analysis looks for relationships among columns existing in different tables, rather than just within a single table. Multi-Table profiling shows the relationships between tables using key fields as links to bridge the multiple tables. The aim is to analyse and determine the existence of referential integrity issues between each table analysed, e.g. orphans and clients without address and contact information. If the profiling analysis reveals two columns in different tables share the same set of values and one is not a foreign key and the other a primary key, than redundant data exists between the two tables. For example, an employee table and an employee demographics table might both contain a "home phone number" column and the same values within these columns. Thus, an employee ’ s home phone number might be stored redundantly in two separate tables.
Typical Investigations for Multi-Table Profiling
Multi-Table Profiling is conducted to verify the validity/uniqueness and accuracy of the key fields to ensure referential integrity. An example would be to assess the Customer Table vs. a Customer Loan Table
Customer Loan Table:
Some vendors have multiple profiling products that are used to fulfil these functions. The attached documents provide further detail on vendor capabilities.
The Iterative Profiling Process
It is important to note that there is an iterative approach to profiling within each of the analysis steps below. Each step involves:
Each of these sub-steps are discussed in more detail in the task routes below.
Figure 1: Profiling is an Iterative Process
The data investigation process should verify what the source system owners say about the data against the actual data; it should also verify what the data says against what the source system owners say. This process is iterative – for example, a source system owner may say that all customer names must have a full first name and full surname. However, when this rule is checked against the data, this shows that 10% of the records have only a first initial. In this case, this must be discussed with the source system owner. This type of anomaly may be explained by a business rule that was applied to new data that was not applied to historical data. Further analysis is performed in this case to verify that all anomalous records were created before the expected data.
Steps in the Process
This section provides a process-driven approach for the executing Data Profiling. It should be noted that this process if often iterative and may start with a preliminary profiling of key systems before doing end-to-end profiling.
Wiki asset search