Pentaho Kettle Solutions
Building Open Source ETL Solutions with Pentaho Data Integration

1. Edition September 2010
720 Pages, Softcover
Practical Approach Book
A complete guide to Pentaho Kettle, the Pentaho Data lntegration toolset for ETL
This practical book is a complete guide to installing, configuring, and managing Pentaho Kettle. If you're a database administrator or developer, you'll first get up to speed on Kettle basics and how to apply Kettle to create ETL solutions--before progressing to specialized concepts such as clustering, extensibility, and data vault models. Learn how to design and build every phase of an ETL solution.
* Shows developers and database administrators how to use the open-source Pentaho Kettle for enterprise-level ETL processes (Extracting, Transforming, and Loading data)
* Assumes no prior knowledge of Kettle or ETL, and brings beginners thoroughly up to speed at their own pace
* Explains how to get Kettle solutions up and running, then follows the 34 ETL subsystems model, as created by the Kimball Group, to explore the entire ETL lifecycle, including all aspects of data warehousing with Kettle
* Goes beyond routine tasks to explore how to extend Kettle and scale Kettle solutions using a distributed "cloud"
Get the most out of Pentaho Kettle and your data warehousing with this detailed guide--from simple single table data migration to complex multisystem clustered data integration tasks.
Part I Getting Started.
Chapter 1 ETL Primer.
Chapter 2 Kettle Concepts.
Chapter 3 Installation and Configuration.
Chapter 4 An Example ETL Solution--Sakila.
Part II ETL.
Chapter 5 ETL Subsystems.
Chapter 6 Data Extraction.
Chapter 7 Cleansing and Conforming.
Chapter 8 Handling Dimension Tables.
Chapter 9 Loading Fact Tables.
Chapter 10 Working with OLAP Data.
Part III Management and Deployment.
Chapter 11 ETL Development Lifecycle.
Chapter 12 Scheduling and Monitoring.
Chapter 13 Versioning and Migration.
Chapter 14 Lineage and Auditing.
Part IV Performance and Scalability.
Chapter 15 Performance Tuning.
Chapter 16 Parallelization, Clustering, and Partitioning.
Chapter 17 Dynamic Clustering in the Cloud.
Chapter 18 Real-Time Data Integration.
Part V Advanced Topics.
Chapter 19 Data Vault Management.
Chapter 20 Handling Complex Data Formats.
Chapter 21 Web Services.
Chapter 22 Kettle Integration.
Chapter 23 Extending Kettle.
Appendix A The Kettle Ecosystem.
Appendix B Kettle Enterprise Edition Features.
Appendix C Built-in Variables and Properties Reference.
Index.