Pentaho Review 18 of 24
"Data Integration with Pentaho Kettle review"
Itamar Steinberg profile photo
Updated December 05, 2014

"Data Integration with Pentaho Kettle review"

Score 9 out of 101
Vetted Review
Verified User
Review Source

Software Version
Modules Used
kettle data integration, pdi

Overall Satisfaction with Pentaho

I used Pentaho Kettle as a team manager of development and later as a CIO. After that I opened a company that consults and implement business intelligence solutions. We are using Pentaho mostly with the data integration module. I think that of all the modules of Pentaho, Kettle is the most complete. It can give a "fair fight" to source solutions that are not open. The problem it addresses of course is to extract data from various sources; transform them; ”play with data”; and then load it to the target. I find the transformation most valuable and rich with functionality. I even made a full scale course about it, you can find it on udemy.
  • Pentaho Kettle gives you a great graphic user interface to plan your transformation and jobs.
  • Pentaho Kettle makes it easy to handle errors, logging and performance.
  • Pentaho Kettle has dozen of great steps like: lookup and SCD functionality.
  • Several steps have performance issues like the Json input.
  • The community edition does not include scheduler and job manager so you need to figure it out yourself, unless of course you buy the Enterprise edition.
  • I think that web service should be easier to operate.
We have experience with Informatica and Talend. I think that between Talend and Pentaho it's a close fight, although I prefer, personally, Pentaho Kettle (Larger community, more resources).
I think that you can say informatica is better than both of them but it is way more expensive and the differences are small.
Let's say, I didn't find something I can't do with Pentaho - maybe it took a little bit more creativity or code (java / javascript).
Basically, the cost is very affordable, if you have already done everything and the project is working then you don't want to replace it because that means making a whole new project. Also I think its very good and the competitor will not overcome the functionality. So you will find your self spending a lot of time and money to get to the same place.
This is not a relevant question, it is in the background. But I can say that when we add a new module / department we need to bring the data from more tables and sources.
Hi, I uploaded a video from my course on you tube - about data sources with Pentaho Kettle.

I will summarize it here. You can connect to relational databases like: mssql, mysql, and Oracle. It also connects to files like: txt, csv, xls (even though I don’t recommend it) and more complex files: xml, json. It can connect to API, web services and of course big data like Hive, Cassandra, MongoDB and more including bulk loads. I think Pentaho Kettle is versatile and has 95% of sources you will ever need.

Again, this is a rating of Pentaho Kettle (not the Pentaho BI suite) so for creating transformation jobs out of steps, understand the flow, writes visual notes and change name of step for better control - it’s all there. Also, to preview data while it is streaming and "see" errors while they happen and control the performance with visual data and graphs - there is a very nice GUI.
For Pentaho Kettle - data integration this is not relevant. You can find more materials about it at: Pentaho Kettle Materials.
I find it suited for 90% of data integration projects , its a very good tool, easy to use, stable and affordable.
I think that the big data connections are still not perfect, so if you have a NoSQL DBl / Hadoop / Cassandra, you might consider extracting the data to file from the source using MapReduce. Also, if you need bulk load, sometimes it's better to use it directly on a tool, for example Redshift / InfiniDB (that is no longer with us).
Apart than that I think it will suit you well.