Open Framework, Information Management Strategy & Collaborative Governance | Data & Social Methodology - MIKE2.0 Methodology
Wiki Home
Collapse Expand Close

Members
Collapse Expand Close

To join, please contact us.

Improve MIKE 2.0
Collapse Expand Close
Need somewhere to start? How about the most wanted pages; or the pages we know need more work; or even the stub that somebody else has started, but hasn't been able to finish. Or create a ticket for any issues you have found.

Perform Data Matching and Consolidation Deliverable Template

From MIKE2.0 Methodology

Share/Save/Bookmark
Jump to: navigation, search
Under construction.png
This article is currently Under Construction. It is undergoing major changes as it is in the early stages of development. Users should help contribute to this article to get it to the point where is ready for a Peer Review.
This deliverable template is used to describe a sample of the MIKE2.0 Methodology (typically at a task level). More templates are now being added to MIKE2.0 as this has been a frequently requested aspect of the methodology. Contributors are strongly encouraged to assist in this effort.
Deliverable templates are illustrative as opposed to fully representative. Please help add examples to this template that are representative of the proposed output.

Overview

In the Data Matching and Consolidation task, data is associated with other records to identify matching sets. Matching records can then either be consolidated to remove duplications or linked to another to form new associations.

In this task, data is associated with other records to identify matching sets. Matching records can then either be consolidation to remove duplications or linked to another to form new associations.

Center

Key Deliverables for Data Matching include:

  • Overall match criteria with business impact
  • Matched/consolidated data
  • Metadata mapping rules for data matching/consolidation

Steps in the Process

Step 1 Design Match process
Objective: In this task, the overall match process is designed. The different criteria to be used for matching and weighting are reviewed with the information management team and it is assured that the business understand the impacts of matching (including the impacts of over-matching).
Input: Corrected Data
Process: Key Steps in the Process include:
  • Determine potential scenarios for matching, based on investigation results and business requirements
  • Establish match criteria (this may be provided in tool). When using a tool, there will oftentimes be pre-defined match criteria modules.
  • Weigh match criteria (pre-set weights my may be provided in a tool). When using a tool, there will oftentimes be pre-configured weights for matching.
  • Assess the risks and impacts of incorrect matching. In a team review session, document and discuss the impacts of proposed matches from a business perspective, including over-matching are presented.
  • Re-assess relative weights of each matching attributes. Weight levels are revised based on impacts on over-matching.
  • Determine rules for Data Mastering (if relevant) regarding which data source is the primary authority, a secondary authority, or slave to any data changes
  • Finalise the design process by defining the inventory of matching techniques to be used for each scenario
Output: Match process high level design



Step 2 Build Match Prototype
Objective: An initial prototype is built and tested on a subset of records
Input: Completion of matching high level design
Process: To test the prototype, matching should be done against a subset of data. This may be supplemented by a full scale match in order to count estimated outcome.

Matches should be evaluated to ensure they are in-fact duplicates (true positives) or valid linkages. Also not matched entities may be inspected for measuring count of false negatives. This may be done by setting a low threshold.

It should be evaluated if manual inspection will be needed and the costs and benefits of doing so should be calculated. With inclusion of manual inspection the threshold for automated matching and dubious matching must be determined.

Matching results should be driven from relationship within the data ownership model.

The design assets will ideally be stored into a metadata repository. Some tools support this more strongly than others.
Output: A working prototype for matching, that works on at least a subset of data and an estimate for the final outcome.



Step 3 Align Matching Metadata
Objective: Ensure the matching design and output is stored into the metadata repository.
Input: Completion of high level design
Completion of match prototype
Process: As with other aspects of Data Re-Engineering, it should be ensured that matching rules are result-sets are stored into a metadata repository.

In this step, the source non-standardised data is mapped to the common model. Mapping rules would include:
  • Merge rules for producer to consumer file-entity mapping
  • Transformation rules for producer to consumer field mapping
  • Cross-reference of any key between systems that would need to be changed

    Ideally, these mapping rules are stored into a metadata repository. For some vendor tools, this will be a by-product of the development process.
Output: If a metadata repository is being used, it should be ensured that this information is loaded up as part of the design and implementation process



Step 4 Execute Overall Match Process
Objective: In this step, the process is executed for the full set of records. Any changed information can be stored as metadata.
Input: Completion of high level design
Completion of match prototype
Process: Key Steps in the process include:
  • Matching process is executed
  • Results are reviewed with team
  • Revisions can be made based on match results and design expectations
  • Initial source files are archived in case "rollback" is required
  • Final signoff on match process
    The implementation assets will ideally be stored into a metadata repository. Some tools support this more strongly than others.
Output: Matched data – this may involve de-duplication or new linkages



Examples

Wiki Contributors
Collapse Expand Close