Building a data warehouse from the ground up with Matillion
Updated July 02, 2021
Building a data warehouse from the ground up with Matillion
Score 9 out of 10
Vetted Review
Verified User
Overall Satisfaction with Matillion
Matillion is being used by the BI Team to orchestrate the population of the Data Warehouse. The Warehouse, in turn, is used to produce a range of reports across all business functions in the organisation. We use Matillion to ingest data from an application database (hosted in AWS), Salesforce, Netsuite and Google Analytics
- Matillion's UI makes it easier to understand the flow of data in your data pipeline.
- Custom Python scripts make it easier to manage and manipulate variables and also to create custom functions (e.g. we use one to post messages to Slack when jobs have failed/succeeded).
- Handling failures in processes is straightforward.
- Passing variables between jobs (orchestration or transformation) feels a bit clunky. It can also be frustrating that you can't pass a variable back up to the calling orchestration job, you can only pass it down to child jobs.
- It would be great to have some kind of debug mode, through which you're able to 'step through' the various tasks in an orchestration/transformation job.
- Matillion's generic API functionality is difficult to understand. Things like handling pagination and rate limiting are complex. Although I understand improvements have been made in recent versions.
- Matillion has been essential in the successful implementation of a data warehouse in our organisation.
- It is unlocking client opportunities for us to provide a "warehouse as a service" as part of our company's offering.
Within a month of getting Matillion we had an established pipeline that was extracting data from the company's application database and updating the corresponding data in the warehouse. Additional data source (e.g. Salesforce and Netsuite) were quick to implement (c. 2 weeks per data source) including any relevant testing.
Matillion provided much more flexibility than the other products we tested, at a much lower price point. Other products, in my view, had a cleaner/simpler UI but I also felt that they offered much less functionality. A key design pattern we had to deliver was to perform delta loads on the warehouse (i.e. updating only those records updated in the database since the last execution). Some other tools seemed to assume that you were accessing data from transactional databases where records were not updated, only ever created. This meant that we could not use them to deliver our Warehouse.
Other tools that did have equivalent flexibility/functionality were a lot more expensive than Matillion.
Other tools that did have equivalent flexibility/functionality were a lot more expensive than Matillion.
Matillion Feature Ratings
Evaluating Matillion's Business Outcomes
Yes we did. It was possible to update the data warehouse database every 3-4 hours instead of every 24 hours. In addition, and perhaps more importantly, it provided a more stable ETL process. The previous one was prone to breaking/failing and Matillion provided a more reliable process and one that was easier to fix if there were any issues.
In terms of infrastructure costs, our costs have gone up using Matillion. Historically we used AWS DMS to clone the production database into a Postgres database. This old process, however, was prone to issues and failures, e.g. when the production database was updated with new column(s)/table(s) it could cause DMS to fail. This would often lead to needing to completely refresh the postgres db which could take 1-2 working days, during which time reporting would be unavailable.
The overall savings are difficult to quantify but the reduced man-effort required to support the Matillion ETL processes far outweighs the additional infrastructure costs
The overall savings are difficult to quantify but the reduced man-effort required to support the Matillion ETL processes far outweighs the additional infrastructure costs
We were able to connect to our main data sources using Matillion (a MySQL database, Salesforce and Netsuite). Other data sources, that are API based and didn't have specific components in Matiilion were more of a challenge at the time. Although API functionality has been improved since we implemented our solution, we made a decision to handle bespoke API development in Python, outside of Matillion.
Onboarding was quick and relatively pain free. The team did have experience using Matillion prior to implementation which obviously helped. However, once the server was setup with the appropriate security groups to access data sources, building the Matillion jobs was straight forward and the UI makes it easier to understand the flow of data vs. that of a purely code based solution. We also noted great performance of transferring data from external databases to Snowflake 'out of the box'. Approaching the same level of performance in Python code took a great deal more effort and optimisation.