From MIKE2 Methodology
|
| This article is currently Under Construction. It is undergoing major changes as it is in the early stages of development. Users should help contribute to this article to get it to the point where is ready for a Peer Review.
|
| This deliverable template is used to describe a sample of the MIKE2.0 Methodology (typically at a task level). More templates are now being added to MIKE2.0 as this has been a frequently requested aspect of the methodology. Contributors are strongly encouraged to assist in this effort.
|
| Deliverable templates are illustrative as opposed to fully representative. Please help add examples to this template that are representative of the proposed output.
|
Overview
In the Data Matching and Consolidation task, data is associated with other records to identify matching sets. Matching records can then either be consolidated to remove duplications or linked to another to form new associations.
In this task, data is associated with other records to identify matching sets. Matching records can then either be consolidation to remove duplications or linked to another to form new associations.
Key Deliverables for Data Matching include:
- Overall match criteria with business impact
- Matched/consolidated data
- Metadata mapping rules for data matching/consolidation
Steps in the Process
| Step 1 Design Match process
|
| Objective:
| In this task, the overall match process is designed. The different criteria to be used for matching and weighting are reviewed with the information management team and it is assured that the business understand the impacts of matching (including the impacts of over-matching).
|
| Input:
| Corrected Data
|
| Process:
| Key Steps in the Process include:
- Determine potential scenarios for matching, based on investigation results and business requirements
- Establish match criteria (this may be provided in tool). When using a tool, there will oftentimes be pre-defined match criteria modules.
- Weigh match criteria (pre-set weights my may be provided in a tool). When using a tool, there will oftentimes be pre-configured weights for matching.
- Assess the risks and impacts of incorrect matching. In a team review session, document and discuss the impacts of proposed matches from a business perspective, including over-matching are presented.
- Re-assess relative weights of each matching attributes. Weight levels are revised based on impacts on over-matching.
- Determine rules for Data Mastering (if relevant) regarding which data source is the primary authority, a secondary authority, or slave to any data changes
- Finalise the design process by defining the inventory of matching techniques to be used for each scenario
|
| Output:
| Match process high level design
|
| Step 2 Build Match Prototype
|
| Objective:
| An initial prototype is built and tested on a subset of records
|
| Input:
| Completion of matching high level design
|
| Process:
| To test the prototype, matching should be done against a subset of data.
Matches should be evaluated to ensure they are in-fact duplicates or valid linkages
Matching results should be driven from relationship within the data ownership model.
The design assets will ideally be stored into a metadata repository. Some tools support this more strongly than others.
|
| Output:
| A working prototype for matching, that works on at least a subset of data
|
| Step 3 Align Matching Metadata
|
| Objective:
| Ensure the matching design and output is stored into the metadata repository.
|
| Input:
| Completion of high level design Completion of match prototype
|
| Process:
| As with other aspects of Data Re-Engineering, it should be ensured that matching rules are result-sets are stored into a metadata repository.
In this step, the source non-standardised data is mapped to the common model. Mapping rules would include:
- Merge rules for producer to consumer file-entity mapping
- Transformation rules for producer to consumer field mapping
- Cross-reference of any key between systems that would need to be changed
Ideally, these mapping rules are stored into a metadata repository. For some vendor tools, this will be a by-product of the development process.
|
| Output:
| If a metadata repository is being used, it should be ensured that this information is loaded up as part of the design and implementation process
|
| Step 4 Execute Overall Match Process
|
| Objective:
| In this step, the process is executed for the full set of records. Any changed information can be stored as metadata.
|
| Input:
| Completion of high level design Completion of match prototype
|
| Process:
| Key Steps in the process include:
- Matching process is executed
- Results are reviewed with team
- Revisions can be made based on match results and design expectations
- Initial source files are archived in case "rollback" is required
- Final signoff on match process
The implementation assets will ideally be stored into a metadata repository. Some tools support this more strongly than others.
|
| Output:
| Matched data – this may involve de-duplication or new linkages
|
Examples