From MIKE2 Methodology
Activity: Prototype the Solution Architecture
Objective
The purpose of this Activity is to test some of the major technology risk areas for the proposed Solution Architecture and gain a better understanding of how the solution will work before moving into a more formalized design process. Prototyping the proposed solution should provide an end-to-end approach that includes each of the major components of the architecture.
The approach outlined in this Activity progressively builds on the functionality in the prior steps to build a more robust solution although the end result is still a "thin-thread" approach. It is most relevant for complex integration projects, projects where there are many unknowns or for initial increments in an implementation where concepts have not yet been tested.
The tasks explained below are examples of how this approach may be used to build a solution that progressively handles greater degrees of sophistication for a complex data integration environment. It integrates data from multiple producer systems into an integrated data store and makes this data available to consuming systems in a simple fashion. Data Quality issues are monitored and flagged in the integration process. Metadata Management is an ongoing task that is tested at each point along the prototype.
Other tasks that may be incorporated in a prototype include:
- Specific testing of complex interfaces
- Operationalisation of data re-engineering processes
- Reconciliation processes within the integration layer
Software development processes around metadata management, use of common services or automated testing also provide good candidate areas to test in a prototype. In summary, anything where the design team is unable to quickly come to a resolution around the approach are candidates for prototyping.
Deliverables
- Working prototype of the proposed solution
Tasks
Build Initial Prototype
Objective:
Test a very simple process that takes some of the existing extract files from a single producing system, loads data into a integrated model and exposes this data through a canonical model. This task would involve the following steps:
- Define staging areas for loading selected extract files from producer.
- Conduct profiling of data as it arrives in staging to prove the basic concept of data validation and flag records with data quality issues in the staging area.
- Define the logical and physical model for the integrated data store that applies to the producer data that will be tested in this phase. Implement this model.
- Define transformations from staging to the integrated data store.
- Build a sample CMM that exposes how data from the producer can be used in a reusable fashion by consumers.
Input:
Output:
Extend Prototype to Identify Single-Producer Data Quality Issues
Objective:
This task builds on the prior task and is the starting point for handling exceptions that we identify in the system. It is focused on how issues identified by data profiling will be loaded into the integrated data store. This task would involve the following areas:
- Extend the integrated data store and/or meta-model to be able to handle data quality issues identified during data profiling.
- Define transformations from staging to the integrated data store, with the inclusion of established data quality attributes that show validation results of data from the producer system.
- Extend the CMM built in the prior task so that is also contains data quality attributes.
The purpose of this step is to establish the best approach for tracking data quality issues identified out of the producer system.
Input:
Output:
Extend Prototype to Incorporate Multiple Data Producers
Objective:
This task builds on the prior task and is focused on showing how data will brought together from multiple systems into the integrated data store. This task would involve the following areas:
- Extend staging areas for loading selected extract files from additional producer system.
- Extend logical and physical model for the integrated data store to provide coverage of the new producer system that will be tested in this step in the prototype. Implement this model.
- Conduct profiling of data as it arrives in staging areas and flag data quality issues using pre-defined data quality validation levels established by the data quality team.
- Define transformations from staging to the integrated data store.
- Build a sample CMM that exposes data from both producer systems and can be used in a reusable fashion by consumers.
This will be the first example of bringing data together from producer systems in an integrated fashion and providing this data to consumers through a reusable CMM.
Input:
Output:
Extend Prototype to Identify Multi-Producer Data Quality Issues
Objective:
This task builds on the prior task, but is now focused on being able to trace the resolution of data quality issues, whether they are done manually or in an automated fashion. It will test how the process of identifying data quality issues in the integrated data store that are then be resolved will work within the system, prototyping areas such as:
- An orphan record was initially identified but it is now possible to link the child record to the proper parent
- Extending the integrated data store with an associative entity to be able to resolve data quality issues.
- Including a scenario where an audit table is created to track changing attributes that are critical to the integrated data store to the point that is required to guarantee delivery of data to consumers.
- A scenario where a producer system has changed and there are major data problems from the source system. In this event, data will not be propagated into the integrated data store.
- Testing how changes in data quality result-sets will be communicated out to consumers.
In summary, this task will test complex issues around identification and resolution of data quality issues that arise during integration.
Input:
Output:
Extend Prototype to Include Automation and Monitoring
Objective:
This task builds on the prior task, and is focused on automation of the overall system and basic steps around ongoing monitoring. In this task automation steps typically include:
- Bring together all integration steps for loading data from producers out to consumers in an automated fashion.
- Operationalise data profiling jobs so that monitoring is conducted in an ongoing fashion.
- Provide a process for basic scheduling and job control
Input:
Output:
Extend Prototype for Performance Tuning and Sizing
Objective:
This task builds on the prior task and is focused on tuning and sizing of the system to deal with performance issues. It will include the following steps:
- Engagement with DBAs to tune the data store and CMMs to improve performance.
- Optimisation of the ETL tool to improve performance.
- Conduct timing exercises on data flows through the integration environment.
It is only very basic performance tuning to highlight any major risk areas or to test technology capabilities that have yet to be tested around performance optimization.
Input:
Output:
Prototype Metadata Management Processes
Objective:
This task is an ongoing process that is reviewed and modified after the completion of each task in the overall process of building the prototype. The purpose of this task is to understand how metadata is managed across the environment through a central repository.
Design-time and run-time metadata is managed from each new component in the architecture as they are added to the prototype. Metadata sources include:
- Models from source systems
- Data models
- Data profiling results
- Design-time ETL metadata
- Run-time ETL operational metadata
The prototype can also test metadata reporting that shows areas such as impact analysis and data lineage.
Input:
- Solution Architecture
- Working prototype
Output:
Core Supporting Assets
Yellow Flags
- Major issues are identified regarding the prototype around the design
- Problems getting connectivity into systems
- Implementation team has significant challenges working with the technology
Key Resource Requirements
- Solution Architect
- A subset of the implementation team, involving representation from each area
Potential Changes to this Activity
This activity should be expanded to cover the prototypin of a solution architecture that also applies to unstructured content. May also be required for BI architecture tasks. Should also have the flexibility to cover certain types of architecture such as EII, SOA and MDA. To do this the activity will likely need to be generalised and reference a number of detailed supporting assets for specific solution offerings. The general theme of testing function, quality, automation and performance still applies. Usability may also need to feature into the approach. Alternatively, this activity may just remain backplane focused and have other activities cover user-facing prototypes.