|
Wiki Home
Members
To join, please contact us. Improve MIKE 2.0
Need somewhere to start? How about the most wanted pages; or the pages we know need more work; or even the stub that somebody else has started, but hasn't been able to finish. Or create a ticket for any issues you have found.
|
MDM ETL Processing SADFrom MIKE2.0 Methodology -> You are here: NetApp Solution > MIKE2:Privacy policy > MIKE2 TALK:Licensing Model > Metadata Development > MDM ETL Processing SAD Title page ETL Processing - Software Architecture Document IntroductionThe document provides a comprehensive overview of the software architecture components supporting the ETL processing within the Client Hub environment. Extract, transform and load (ETL) is a major component within the Client Hub that is responsible for extracting data from source systems, transforming the data into required formats and loading the transformed data into the databases within the Client Hub environment. PurposeThis document provides an architectural overview of the ETL processing system, using a number of different use cases and architectural views to depict different aspects of the ETL processing within the Client Hub. It is intended to capture and convey the significant architectural decisions which have been made on the system. ScopeThe document focuses on the ETL software components that process and transform data from the source systems and load the processed data into a database of different stages within the Client Hub.
Definitions, Acronyms and AbbreviationsPlease refer to the Glossary section in the Appendix, which contains a list of the main definitions, acronyms and abbreviations used in this document. ReferencesThe documents referenced in this document can be found in the Appendix A – References section. Please note that some of the referenced documents are still in draft version. OverviewThe document begins with a high-level architectural representation of the overall ETL process and subsequently provides a lower-level detail of each architecture component. Architectural RepresentationThe high-level ETL processing architecture shows major system and software components that involve in the movement of source data into the Client Hub. The system components include the source systems that provide data for the Client Hub and the Client Hub which is the target data destination of the ETL processing. The software components, in contrast, are software programs that are used for extracting, transforming and loading of data into the databases within the Client Hub environment. The software components comprise of the source extract programs, ETL processes as well as other supporting software components that complement the ETL processes. System Components
’ s Databases:
Software Components
Architectural Goals and ConstraintsList the key high level architectural goals and constraints related to the ETL processes, the databases that will be involved etc. Use-Case ViewExamples of the use cases within the client hub
Business Scenarios (Examples)There are two primary business scenarios related to the MDM Repository ETL Processing: Initial Load ScenarioThis business scenario takes place when the data store within the Client Hub is to be initially loaded with data from the source systems. The scenario may also occur when the database within the Client Hub has to be reloaded if the data is wiped out or if the database has been re-instantiated. Delta (Incremental) Load ScenarioThe Delta Load scenario entails the ETL processing of delta data from the source systems that normally occurs at a regular interval (daily, weekly or monthly). The delta data from the source system is defined as new data records, data records that have been updated or data records that have been deleted in the source systems. Delta data as a result of a change made to the source systems since the previous occurrence of delta data extraction will be processed by the Delta Load ETL Processing scenario. Sequence DiagramSequence diagram illustrating the sequence of use case interactions and system boundaries where a use case is residing. FunctionalityThis section describes functionalities of the ETL processing software component. Some of the examples are Sample functionality is described below Receive Source Data FunctionalitySupport for Extracting Required Source Data Elements from Original ExtractSupport for Ensuring that No Files are Overwritten UnintentionallySupport for Verifying Completeness of Received Extract FilesPersisting Source Data in the Pre-Landing Database FunctionalitySupport for persisting or storing the source data records in the Pre-Landing DatabaseLoad Business Data Elements to Pre-Landing DatabaseLoad Technology Data Elements to Pre-Landing DatabaseMaintain Original Source Data ContentMaintain Original Source Data FormatSupport for Tracing Staging Data to its Original in Source ExtractPersisting Source Data in the Landing Database FunctionalitySupport for persisting or storing the source data records in the Landing DatabaseLoad Business Data Elements to Landing DatabaseLoad Technology Data Elements to Landing DatabaseMaintain Original Source Data ContentMaintain Original Source Data FormatSupport for Tracing Staging Data to its Original in Source ExtractPersist Data in Staging Database Functionality====Transform Source Data to Staging Database ’ s Required Formats==== Translation of Source Codes to Standardized CodesSupport for Surrogate Key GenerationLogical ViewThis section should provide a logical view of the system and the database components System ComponentsTypically the various landing and staging areas, source systems etc. Examples: Source Systems: The in-scope source systems.
’ s defined formats.
’ s required data elements from the source system.
Software ComponentsThe software components typically include the major ETL processes: Examples are
The Source-to-Pre-Landing ETL is a data movement process that reads source data files and loads the data into the Pre-Landing database. There should be no transformation on the raw data during this process. However, due to the record structure and the format of the data from certain system (e.g., mainframe) that may have sub records or repeating values and the source data types that may not be supported by the RDBMS in the distributed environment, the Source-to-Pre-Landing ETL may apply data transformation rules to the extent that the source data could be loaded into the Pre-Landing database.
The Pre-Landing-to-Landing ETL is a data movement process that performs column filtering rules to remove unnecessary data fields from the Pre-Landing database and load only required data fields into the Landing database. Note that in this ETL software component, another filtering rule based on a list of branch numbers may be applied to facilitate the testing allowing the test team to look at smaller, more manageable volume of data.
The Landing-to-Staging ETL is a data movement process that performs extensive data transformation for populating the database in the Staging Area. It reads source data that is residing in the Landing Area ’ s database. The following classifications of data transformations occur in the Landing-to-Staging ETL:
Process ViewThe process view of the ETL processing architecture reveals ETL processes that support the initial as well as the delta ETL processing requirements. Example: Source-to-Landing ETLThe logical view of the Source-to-Landing ETL assumes that the source data files from source systems contain data content and formats that meet the requirements of the Client Hub. In the process view of the Flow-1 or the Source-to-Landing ETL, however, reveals the fact that the requirements may not be met by certain source systems. For example, the files may have different file formats or contain many more data fields than needed by the Client Hub. In such case, the Source-to-Landing ETL can be broken down into two ETL processes:
Major Processing Steps:
Deployment ViewThis deployment view of the ETL processing architecture provides a level of details as to how ETL architecture components reside in the source systems, the ETL environment as well as in the Client Hub environment. Implementation ViewOverview (Example)Major Processing Steps and Design Specifications:
FTP Source Data Files to ETL Server
The following exception-handling rules defined for this program.
Job Sequence and Operational ViewJob Sequence Diagram of Source-to-Pre-Landing ETL Process (illustrative Example)Data ViewThe section provides a detail of each data architectural components that are relevant to the ETL Processing Examples : Source file names; details of the relevant data models etc.
Size and PerformanceAppendix A – References (sample)Standards, Guidelines and Best PracticesMDM Software Requirements Specification (SRS)Identification Management SRSClient Hub ServicesClient Hub InfrastructureData ModelSource Extract LayoutsData MappingsDeployment Information of Software ComponentsThe section is for informational purposes only. ETL Environment
Services Environment
Database Environment
Key Trigger File Locations
Exception Classification and Exception Codes (example)
|
Wiki asset search
Toolbox
Views
Wiki Contributors
|

