Data Integration with Pentaho Kettle review
Updated December 05, 2014

Data Integration with Pentaho Kettle review

Itamar Steinberg | TrustRadius Reviewer
Score 9 out of 10
Vetted Review
Verified User

Software Version


Modules Used

  • kettle data integration
  • pdi

Overall Satisfaction with Pentaho

I used Pentaho Kettle as a team manager of development and later as a CIO. After that I opened a company that consults and implement business intelligence solutions. We are using Pentaho mostly with the data integration module. I think that of all the modules of Pentaho, Kettle is the most complete. It can give a "fair fight" to source solutions that are not open. The problem it addresses of course is to extract data from various sources; transform them; ”play with data”; and then load it to the target. I find the transformation most valuable and rich with functionality. I even made a full scale course about it, you can find it on udemy.
  • Pentaho Kettle gives you a great graphic user interface to plan your transformation and jobs.
  • Pentaho Kettle makes it easy to handle errors, logging and performance.
  • Pentaho Kettle has dozen of great steps like: lookup and SCD functionality.
  • Several steps have performance issues like the Json input.
  • The community edition does not include scheduler and job manager so you need to figure it out yourself, unless of course you buy the Enterprise edition.
  • I think that web service should be easier to operate.
We have experience with Informatica and Talend. I think that between Talend and Pentaho it's a close fight, although I prefer, personally, Pentaho Kettle (Larger community, more resources).
I think that you can say informatica is better than both of them but it is way more expensive and the differences are small.
Let's say, I didn't find something I can't do with Pentaho - maybe it took a little bit more creativity or code (java / javascript).
Basically, the cost is very affordable, if you have already done everything and the project is working then you don't want to replace it because that means making a whole new project. Also I think its very good and the competitor will not overcome the functionality. So you will find your self spending a lot of time and money to get to the same place.
I find it suited for 90% of data integration projects , its a very good tool, easy to use, stable and affordable.
I think that the big data connections are still not perfect, so if you have a NoSQL DBl / Hadoop / Cassandra, you might consider extracting the data to file from the source using MapReduce. Also, if you need bulk load, sometimes it's better to use it directly on a tool, for example Redshift / InfiniDB (that is no longer with us).
Apart than that I think it will suit you well.