Item: Matillion
Rating: 8
Author: Kris Shinn

Overall Satisfaction with Matillion

Use Cases and Deployment Scope

Matillion is used in our Data Analytics and Data Engineering department. At the time we purchased it, we had no dedicated data engineers. We needed something powerful enough to handle more complex jobs, but simple enough that a semi-technical analyst could use. We were using a lot of different tools at the time and were evaluating a lot of different ETL solutions in the space.
Matillion provided a solution that was simple, easy to connect, and also provided Internet connectors to some of our most crucial systems, such as Salesforce and Jira. It has allowed us to consolidate much of our processes onto a single system that is easily understood.

Pros and Cons

Pros

Cloud connectivity: It makes pulling data from cloud services like Salesforce super simple and easy to bring into a data warehouse
ETL Orchestration: The drag and drop interface makes it easy to compose new orchestration layers in our ETL. It's something that does not require a Data Engineer to complete.
Enterprise integration: It was really easy to configure into our LDAP system, and that makes administering the box really easy.
Variety of Data sources: It is pretty easy to bring data into Matillion to process into the data warehouse.
The Gui provides other non-functional visual elements to mark up the job. This is great for team members to communicate complicated parts of the ETL or to otherwise label parts of their ETL.

Cons

Matillion has no clustering ability. For particularly large jobs or large data sources, processing can take a long time and it does not have the ability to map-reduce, like Spark.
The output is limited to Redshift. Often times we would want to drop a Parquet or Avro file into s3 as the output of our ETL.
We often get OOM errors and other server related constraints. We need to be very careful about how our jobs are scheduled in order to make everything work well.
It is not clear from the documentation how to organize work in Matillion. Between environments, projects, and jobs in a project, we've had to organize in a way to accommodate for Matillion's limitations, which doesn't allow us to organize our jobs in a way that makes sense for us.

Return on Investment

Matillion had had a positive impact on our ability to scale up our ETL process without increasing our headcount. We've been able to double the number of jobs we have in the last year.
A positive impact on the amount of time we need to investigate failures. Coming from other tools that had high rates of failures (say 2 jobs failing a week), we have not seen as many technical failures from Matillion. This reduces the number of resources we focus on tracking down failures, fixing, and re-running jobs by 25%.

Usability

It is pretty intuitive to put together a job. Once you get into larger organization features like creating environments, projects, and mapping the two together, it gets pretty hairy. The other place where it's not really intuitive is to get an overall picture of health in your ETL jobs, find where failures are happening, and really trace those down. For failures, I often drop down to interface with the database directly to get useful information. Matillion has these capabilities, but I find the processes often hang when trying to access them.

Time to Value

We were able to get up and running with Matillion within a week. When we spun it up, we started putting together our first jobs, and they were super simple to set up. Though the jobs were on the simpler side, it was pretty easy to get started.

Scalability

It's only vertically scalable from the resource point of view. Even though we are on a larger instance, we often get resource limitation failures when trying to process large files. There is no way to Map Reduce this in a cluster, which I think is a large limitation in Data Engineering.
On the complexity side, I think that the simple & intuitive interface is really confusing for very complex jobs. For jobs where we need to aggregate multiple data sources into a unified data layer, the layout of the job gets very complex, and I don't think it provides the type of value we are looking for in this respect.

Alternatives Considered

Alooma and Fivetran

There's a number of systems not available to enter in here that we also took a look at: AWS Data pipeline, Airflow, Xplenty, etc. The reason we chose Matillion is for the balance of features (ability to connect into cloud data sources like Jira), a simple interface to put together Data pipelines (that required little to no coding), and a reasonable price point. Matillion is pretty unique in that it hits a balance of all three of those requirements.

Other Software Used

Amazon Redshift, AWS Lambda, Tableau Server

Likelihood to Recommend

Matillion is great at processing an ETL for cloud-based systems (Jira, Salesforce, Google Analytics, etc). It reduces (or in some cases eliminates) the need to put together a custom software interface into these systems. It is also great for non-technical users who want to put together some ETL processes for analytics, but do not want to invest into a Data Engineering team. It's also great for landing data for consumption into end datastores like Snowflake or Redshift.
Matillion is not great for large datasets or prepping for data science. As a single vertically scaled solution, it does not have the power of a cluster oriented ETL technology like Spark. Additionally, to prepare datasets for data science where you would want to bring in a processed dataset in Parquet or Avro formats, it requires you to land the data into Redshift and then dump it back out, then format it, in order to get it into a portable format for something like RStudio.

Matillion Feature Ratings

Comments

Please log in to join the conversation

2 Years with Matillion