Building a data warehouse from the ground up with Matillion
Updated July 02, 2021

Building a data warehouse from the ground up with Matillion

Anonymous | TrustRadius Reviewer
Score 9 out of 10
Vetted Review
Verified User

Overall Satisfaction with Matillion

Matillion is being used by the BI Team to orchestrate the population of the Data Warehouse. The Warehouse, in turn, is used to produce a range of reports across all business functions in the organisation. We use Matillion to ingest data from an application database (hosted in AWS), Salesforce, Netsuite and Google Analytics
  • Matillion's UI makes it easier to understand the flow of data in your data pipeline.
  • Custom Python scripts make it easier to manage and manipulate variables and also to create custom functions (e.g. we use one to post messages to Slack when jobs have failed/succeeded).
  • Handling failures in processes is straightforward.
  • Passing variables between jobs (orchestration or transformation) feels a bit clunky. It can also be frustrating that you can't pass a variable back up to the calling orchestration job, you can only pass it down to child jobs.
  • It would be great to have some kind of debug mode, through which you're able to 'step through' the various tasks in an orchestration/transformation job.
  • Matillion's generic API functionality is difficult to understand. Things like handling pagination and rate limiting are complex. Although I understand improvements have been made in recent versions.
  • Matillion has been essential in the successful implementation of a data warehouse in our organisation.
  • It is unlocking client opportunities for us to provide a "warehouse as a service" as part of our company's offering.
I find Matillion very intuitive and quickly got to grips with it. I have found that developers with a more purist coding background have struggled to get to grips with it more, particularly in relation to managing variables. For me personally, this has not been an issue and the online documentation has been excellent in digging deeper into specific functional areas
Within a month of getting Matillion we had an established pipeline that was extracting data from the company's application database and updating the corresponding data in the warehouse. Additional data source (e.g. Salesforce and Netsuite) were quick to implement (c. 2 weeks per data source) including any relevant testing.
We've not had a need to scale past the smallest version of Matillion and so I can't comment on this.
Matillion provided much more flexibility than the other products we tested, at a much lower price point. Other products, in my view, had a cleaner/simpler UI but I also felt that they offered much less functionality. A key design pattern we had to deliver was to perform delta loads on the warehouse (i.e. updating only those records updated in the database since the last execution). Some other tools seemed to assume that you were accessing data from transactional databases where records were not updated, only ever created. This meant that we could not use them to deliver our Warehouse.

Other tools that did have equivalent flexibility/functionality were a lot more expensive than Matillion.
Matillion is great for handling bulk and 'delta' loads to your warehouse, i.e. updating the warehouse with only those records updated since the last execution of the pipeline. The components and Python scripting provide a huge amount of flexibility in what the pipeline does, but you do need to have the expertise to know how to implement it properly.

Matillion has been less good at extracting data from APIs. The functionality was found to be complex and it was unclear how to manage things like pagination and rate limiting in API calls.

Matillion Feature Ratings

Connect to traditional data sources
Simple transformations
Complex transformations
Business rules and workflow
Testing and debugging

Evaluating Matillion's Business Outcomes

Yes we did. It was possible to update the data warehouse database every 3-4 hours instead of every 24 hours. In addition, and perhaps more importantly, it provided a more stable ETL process. The previous one was prone to breaking/failing and Matillion provided a more reliable process and one that was easier to fix if there were any issues.
In terms of infrastructure costs, our costs have gone up using Matillion. Historically we used AWS DMS to clone the production database into a Postgres database. This old process, however, was prone to issues and failures, e.g. when the production database was updated with new column(s)/table(s) it could cause DMS to fail. This would often lead to needing to completely refresh the postgres db which could take 1-2 working days, during which time reporting would be unavailable.

The overall savings are difficult to quantify but the reduced man-effort required to support the Matillion ETL processes far outweighs the additional infrastructure costs
We were able to connect to our main data sources using Matillion (a MySQL database, Salesforce and Netsuite). Other data sources, that are API based and didn't have specific components in Matiilion were more of a challenge at the time. Although API functionality has been improved since we implemented our solution, we made a decision to handle bespoke API development in Python, outside of Matillion.
Onboarding was quick and relatively pain free. The team did have experience using Matillion prior to implementation which obviously helped. However, once the server was setup with the appropriate security groups to access data sources, building the Matillion jobs was straight forward and the UI makes it easier to understand the flow of data vs. that of a purely code based solution. We also noted great performance of transferring data from external databases to Snowflake 'out of the box'. Approaching the same level of performance in Python code took a great deal more effort and optimisation.