Open Framework, Information Management Strategy & Collaborative Governance | Data & Social Methodology - MIKE2.0 Methodology
Wiki Home
Collapse Expand Close

Collapse Expand Close

To join, please contact us.

Improve MIKE 2.0
Collapse Expand Close
Need somewhere to start? How about the most wanted pages; or the pages we know need more work; or even the stub that somebody else has started, but hasn't been able to finish. Or create a ticket for any issues you have found.

Data Investigation and Re-Engineering High Level Solution Architecture Options

From MIKE2.0 Methodology

Jump to: navigation, search

This section lists a number of High Level Solution Architecture options for Data Investigation and Re-Engineering. The architecture options are part of the overall SAFE Architecture framework and can be used as Supporting Assets for the Activity covering definition of the Future State Vision for Information Management


= Data Profiling as part of the SDLC process

Data Profiling can be done as an early step in the SDLC process to gain a quantitative assessment of data that will flow between producers and consumers. Data Profiling helps to remove the uncertainty and assumptions related to Producer Systems. It will also involve an ongoing monitoring process that is put in place once the solution has been implemented. A model for the Data Profiling process is as follows:

The Design-Time Data Profiling Process
  1. Flat File Source System Data Extract. May be Iteratively Done
  2. Metadata, Including Transformation Rules, for Use in Analysis
  3. Includes the Tests and Reference Data for Analysis
  4. Review Metadata for Completeness, Unique Ids, Descriptions, etc
  5. If Appropriate, External Data can be Used for Matching / Enrichment
  6. Profile for Timeliness, Duplication, Accessibility, Completeness, Integrity, and Validity
  7. Results of Metadata Profiling
  8. Formal Quantification of Data Quality Profile Results
  9. Formal Quantification of Metadata Quality Profile Results
  10. Based on Profiling Issues and Gaps Identified, Recommend Areas for Improvement

This approach to Data Profiling is fairly fundamental and can be incorporated into the other High Level Solution Architecture options listed below. comparer forfait rio b and you portabilité calcul IMC rio orange

A Framework for Ongoing Data Quality

An architecture framework for ongoing data quality management provides an approach where data quality is quantitatively understood during the design process, metadata artifacts are used to drive ETL design and data quality issues are addressed in batch and in an ongoing fashion. It is an approach that can re-used for varing data sets and is therefore particularly important for significant efforts such as IT Transformation. The key aspects of this approach are highlighted below.

High Level Solution Architecture for Ongoing Data Quality Management

Data Profiling and Definition of Metadata Model

The data is extracted from the source and staged on a profiling platform. Using a profiling tool set the data is then examined and assessed. Its current state is documented. Relevant metadata is assembled and the information is rationalized with an emerging enterprise attribute standard and a current understanding of the key business rules. The current state of data quality (DQ) is documented and a DQ plan is created for the data. Some attributes must be fixed before movement to the target environment while others may be fixed after the move. These judgments are made by looking at the the profiling results. The goal is to assess the capability of the current data values and granularity to support the functions proposed for the target environment. Transformations are designed within an ETL tool to migrate the attributes into the Metadata Standards for movement to the Data Quality platform. Many of these same transformation will be reused by the Legacy Interfaces to transform information to the Enterprise Standard for use in ongoing data synchronization.

Producer to Consumer Mapping using a Metadata-Driven Approach

At this point the attributes have been mapped to a common Enterprise standard. An ETL like tool is used to migrate the data to enterprise standard implemented on the Data Quality platform. Those aspects of the Data Quality program that are to be implemented before migration to the production environments are executed. All data quality functions (i.e., validity, missing values, de-duping etc.) are performed after the data has be transformed to the attribute standard. Transformations and migration capabilities are constructed to migrate the data to the production data structures. All the business rules and data rules are captured as metadata because this knowledge and capabilities are needed to maintain data synchronization and quality on an ongoing basis in the production environment (i.e., many conversion requirements are ongoing). Also at this point any stored procedures in the source data base are targeted for new database procedures in the target system or functions in the target application or a common service available to all.

Testing the Conversion Capabilities

Step 3 focuses on the messaging, object and interface standards. The data is migrated to a staging area which has the same structure as the target production environments. The interfaces, messages and objects are all validated for correctness, performance and compliance with the DQ plan. The knowledge of the business rules, objects and XML definitions are positioned for re-use in the production environments. Much of the metadata repository(s) are populated during these activities. The formulation of an ongoing production data mediation platform is a key outcome of this step.

A Re-Usable Platform for Data Qualiy Management

Nearly all of the knowledge and functionality acquired in the conversion process becomes reusable in the production environment on an ongoing basis. In this environment rules and standards may be invoked on record at a time basis (compared to batch migration) as a data service for use by any application.

Targets of Opportunity for New Services

The data is migrated to the actual target environments. This step may include a set of activities for ‘user acceptance testing’ as well - with subsequent migration to production. Some of the DQ and data mediation will become ongoing processes and not just a ‘one time’ move. As aspects of new and old systems are used concurrently there is an ongoing requirement for data synchronization which is met by the ongoing data mediation platform. Concurrent use for aspects of the new and old systems is addressed in the transformation plan. This provides an iterative capability to perform lower risk transformation projects.

New targets of opporuntiy then emerge as candidates for common services enabled by the Data Mediation platform which was formulated during the conversion process. This becomes a ‘Forever Re-Useable’ platform.

Wiki Contributors
Collapse Expand Close