From MIKE2.0 Methodology
Architecture Patterns for Data Synchronisation describe how different components in the SAFE Architecture interact to solve a general set of problems for Data Synchronisation.
The Problem and its Symptoms
Redundant silos of data typically exist across the enterprise for a number of business systems. These systems not only represent and interpret data in their own fashion, but also store it at varying levels of quality. The proliferation of distributed systems across the organization mandates greater communications between applications in order to fulfill business processes. Therefore, the integration requirements between systems are increasing in significance and the timeliness of data has become more critical.
A superior approach to Data Synchronisation is of considerable importance to the business – it is, in fact, one of the most significant capabilities that IT can deliver. If done poorly, Data Synchronisation issues can cause a number of problems, such as:
- Customer Management becomes less effective and fragmented and call centre reps are encumbered by the lack reliable data.
- Data Synchronization issues (due to both latency and poor data quality) cause significant problems in inventory management by introducing inefficiencies and inaccuracies into the supply chain.
- Data Synchronization issues often result in errors in customer invoices. This causes not only customer dissatisfaction, but has a substantial cost for reconciliation.
- The cornerstone to effective Sales and Marketing is insight on customer data. The inability to establish a view of customer data from multiple systems across the enterprise limits companies’ ability to do intelligent marketing.
- The lack of automation of synchronization processes forces ‘swivel-chair integration’ (employees manually re-entering information between systems) and increases data entry errors and latency
Design Pattern for Event-Based Data Synchronisation
Event-Based Synchronisation through the SAFE Architecture
Event-Based Data Synchronisation applies to those events classified as operational and transactional within the BusinessTime Model. The design pattern below illustrates a best-practices approach to implementing event-based data synchronization.
- An event occurs within an application that results in an invocation of the Service Requester adapter interface being invoked by the application.
- Within the Service Requester data out of the application is transformed from the application specific format and validated against a CMM Message structure.
- The Service Requester publishes out a CMM Message, which includes both the data payload and header information.
- A Mediator Business Services subscribing to the CMM message receives the message.
- The Mediator looks at the rules for Data Ownership, and invokes a process that is dependent on the sender of the message.
- If the sender was the Primary Master of the CMM-defined business event, the general approach is that it will act to propagate the data to consuming systems without any “authentication” from additional systems.
- If the sender was a Secondary Master of the CMM-defined business event, the mediation rules will generally be more complex. An additional mediation process must be invoked that receives “authentication” from any Primary Master or Secondary Master systems that may exist. The process of Secondary Master updates is dependent on the (often complex) business rules that have been defined.
- The mediator may also invoke additional Business Services for data processing or invocation of business rules. In particular, Data Management Business Services may be invoked to perform functions such as calling a database of reference or providing a matching function.
- The Mediator will then publish out the message to consuming systems. The nature of this publication (e.g. all systems at once through a publish/subscribe approach or on an individual basis to each system) will depend on variables such as:
- If the sequence of events to update applications has any bearing in the solution
- If there are limitation within a consuming system in performing rollback functions
- If there inherent time-latencies in some of the operational systems
- Service Providers subscribing for the CMM message receive the message.
- Service Providers transform the message from the CMM format into the application-specific format and invoke the adapter interface into the application to submit the data.
- Service Providers pass status back up to the Mediator Business Service.
- The Mediator Business Service owns the overall process and is responsible for its “completion”. The Mediator Business Service controls any actions that are to be taken as a result of the statuses passed back up by the Service Providers.
- A Message Auditor plays a passive role in consuming status messages from both Interface and Business Services. Data consumed by the Message Auditor can be used to provide end-to-end traceability of an initiating event and its lifecycle of updates across applications.
Design Pattern for Batch Data Synchronisation
Batch-Oriented Synchronisation through the SAFE Architecture
Batch Data Synchronisation applies to those events whereby synchronization is generally driven by temporal events and often involves large volumes of data. The design pattern below illustrates a best-practices approach to implementing Batch Data Synchronisation.
- A batch process is invoked that triggers an extraction of data from one or more source systems. This batch process is often times triggered by a temporal event, e.g. a set time that a job should be run. Data extraction processes vary from the reading of flat files, direct database connections, through to receiving data through an API interface or message queue.
- The data is transformed from its native format. This transformation process may include the following:
- Transformation of the source data into a CMM format. Although standardising data into CMM message formats have typically been reserved for event-based integration, CMM transformation provides the same benefits to batch integration. Resource impacts due to large data extracts sizes may have some limitations on this approach, though ETL technologies have advanced significantly in the past 2 years (e.g. support for XML message schemas) to facilitate the use of this model.
- A Data Management Service may be invoked to perform functions such as cleansing of data. This may involve loading data into a temporary staging area. Within a services-oriented architecture, this process is fundamentally the same for both batch and event-based synchronization.
- The extracted data may be run against a set of business rules (e.g. checking for data integrity) and an action taken based on the result (e.g. additional lookup to source system, a decision not to load data into a target system). This may involve (and is often best-practice) first loading data into a separate staging area.
- A transformation process will convert data into the specific format for loading into the target system. Additional validation rules may be performed against the data set in this form.
- Data will be loaded into the target system, either through an interface or direct database access.
- The controller component will control the flow of the process and may receive status from the load process. There is a high degree of variance across ETL and custom solutions in defining their capabilities in this area.
- Status of the ETL process will be available through auditing/operations components.
Design Pattern for Batch Data Migration
Data Migration through the SAFE Architecture
This model below is a variation of
Batch Synchronisation and is not strictly an architecture pattern in that it also coveys an extended process of software delivery. It is not a fully automated process of Data Synchronization but instead shows the steps involved in a large-scale data migration process. The migration process is done within the framework of the Services-Oriented Architecture and illustrates how metadata standards are developed as a by-product of this approach.
- Data is extracted from the source system(s), generally either through extract tables or flattened extracts.
- The data is loaded into a staging area where it will be profiled from the volume of rows as well at the characteristics of the content at the attribute level. This information will be used to determine which business rules and transformation need to be invoked early in the process. This step of “Data Investigation” generates pattern reports, which display the unique patterns and frequency counts of the values in selected fields.
- Metadata will begin to be established at this time (e.g., data mapping and the seed attributes for the CMM). Data standards will be agreed to and invoked at this stage in preparation for data movement. All source attributes will be mapped into the target attributes in the target complex. Mapping the attributes into the enterprise attribute standard is an optional by product of this step.
- All agreed to transformations and standardizations required to move the data into the staging area for testing and production are implemented. The data is moved into the staging area.
- Data profiling is done again and measured against the agreed upon move success criteria for all steps up to this point. Additional data standardizations are performed in this staging area to assist in the data matching and generally measure data quality against agreed upon criteria. After the standardizations the rules for which records cannot or should not be moved are applied. This step often requires considerable analysis.
- This step involves the actual move of the data into either the testing environment or the production environment. It could be thought of as 6-T for testing and 6-P for production. The processes and capabilities used will be as identical as possible varying only in the target destination.