We used the Pentaho Data Integration tool to pre-process some data and fill thr production database. It was used in the department where I worked. Being a software developer I made a decision to use PDI instead of writing scripts on Python. The main advantage of PDI was supporting scaling out of the box, and the ability to run on multiple machines. When you need something less than a Hadoop cluster but bigger than self-made quick scripts it fits well.