From MIKE2.0 Methodology
||This article is currently Under Construction. It is undergoing major changes as it is in the early stages of development. Users should help contribute to this article to get it to the point where is ready for a Peer Review.
Kettle, now known as Pentaho Data Integration is one of the better known Open source ETL (Extract, Tranform and Load) tools on the market. It is provided by Pentaho, a commercial open source BI vendor.
Relationship to MIKE2.0
- Used to deliver the ETL sub-component
Overall Implementation Guide
ETL Design and Implementation Activities of MIKE2.0, which are:
Alignment with Strategic Requirements for Infrastructure Development
- Platform: runs on Windows, Unix and Linux.
- GUI: GUI interface with visual transform indicators. Reporting available from the metadata layer.
- License: Mozilla Public License.
- Source: Source code available at Get the code.
- Support: A Pentaho forum and a Issue Tracking and Pentaho Community with deep dive technical articles that are better than some premium ETL vendor sites.
- Connectivity: Supports Oracle, DB2, SQL Server and Sybase. Supports open source MySQL, PostGres, Hypersonic, FireBird SQL and Ingres. Supports connectivity to SAP R/3 for a license fee.
- Scalability: supports a Parallel Processing Architecture by distributing ETL tasks across multiple servers.
- One of the oldest open source ETL tools it has a large user community and a new drive from the support from Pentaho.
- Out of the box integration with other Pentaho open source products such as BI, EII and EAI.
- The GUI Designer interface, the out of the box transformer objects and the support for slowly changing dimensions should enable increased developer productivity.
- Community articles shows an enthusiastic sharing of tips and tricks.
- Mozilla Public License allows the embedding of KETTLE into another product without license fees.
- Does not have a specialised data quality component or a partnership with a data quality vendor.
- Potential performance overheads on high volume data joins/lookups where the lookup database is accessed over a network. The streaming lookup works best for small lookup volumes. For very large lookup sources the data should be stored in a database locally on the ETL server.
Functonality that users of the MIKE2.0 Methodology would like to see added to this product are as follows:
User Valuation Enhancements
Voting scores from MIKE2.0 Contributors on the value of the asset in the context of the overall methodology
 available from sourceforge.net
Open Source Licensing
Comparable Open Source Products
Reference Implementations through MIKE2.0