A good tool for your ETL needs, keep an eye on the bill and explore valid alternatives.
May 22, 2024
A good tool for your ETL needs, keep an eye on the bill and explore valid alternatives.
Score 7 out of 10
Vetted Review
Verified User
Overall Satisfaction with Matillion
We use Matillion mainly as an orchestration tool to stage data from multiple sources and some light transformation. We also use it to export data to S3 leveraging Snowflake.
The GUI is intuitive, and the web interface helps to be up and running very quickly.
We have some issue related to the resources needed for some jobs, there's no visibility of the system resources used or auto-balancing of some activity to avoid the server to crash.
We don't like the billing mechanism being a cost based on the server CPUS because we host the server so we are already paying for it.
We would prefer a billing mechanism decoupled from the server resources for many reasons including the fact that it crashes on some jobs due to memory issue and upgrading the server would double the bill instantly for the same jobs. It doesn't scale naturally.
Few years ago when we started using it, it was a great player in the cloud ELT world, today it is suffering that while the interface is web, the engine itself is still monolithic and static, hard to migrate and move to a new machine.
We will be looking for other tools that have the creation of the data workflows and the actual scheduling/execution decoupled so that you can use a central hub to plan and create the logic and then decide in which region/sesrver to run them without having to worry about a full server being installed in every region.
Docker/kubernetes comes to mind, but implemented and managed effortlessly behind the scenes. We don't want to deal with it, just use the tool.
The GUI is intuitive, and the web interface helps to be up and running very quickly.
We have some issue related to the resources needed for some jobs, there's no visibility of the system resources used or auto-balancing of some activity to avoid the server to crash.
We don't like the billing mechanism being a cost based on the server CPUS because we host the server so we are already paying for it.
We would prefer a billing mechanism decoupled from the server resources for many reasons including the fact that it crashes on some jobs due to memory issue and upgrading the server would double the bill instantly for the same jobs. It doesn't scale naturally.
Few years ago when we started using it, it was a great player in the cloud ELT world, today it is suffering that while the interface is web, the engine itself is still monolithic and static, hard to migrate and move to a new machine.
We will be looking for other tools that have the creation of the data workflows and the actual scheduling/execution decoupled so that you can use a central hub to plan and create the logic and then decide in which region/sesrver to run them without having to worry about a full server being installed in every region.
Docker/kubernetes comes to mind, but implemented and managed effortlessly behind the scenes. We don't want to deal with it, just use the tool.
- Web interface is good enough
- Set of built in components available for orchestration/transformation
- Integration with target database (Snowflake for us)
- Static and monolithic, it will show its limits when running multiple concurrent jobs.
- Github and versioning implementation is messy and broken. Don't use it.
- There's not way to see/query the system resources, just wait for a server to crash due to out of memory. An admin panel would be appreciated + some env variables with updated info.
- API implementation is cumbersome and limited.
- There's no concept of hub and worker engine, everything happens of the same server (designing workflows and executing them). Having separate light ETL engines to run job could be better. (sort of docker/kubernetes/lambda functions).
- Handling of variables is limited especially for returned values from sub components.
- Some components could return more metadata at the end of their execution instead of the standard one.
- Billing is badly designed not taking into account that the server is hosted by the client. Expensive.
- We had several issue with migration where starting a new instance was required and then migrating the content. It was painful and time consuming also have to deal with support and engineering team on Matillion side.
- CDC doesn't work as expected or it is not a mature product yet.
- Ability to have embedded analytics covering multiple systems just in one place
- Hassle free data movement
- CDC doesn't work properly so real time data is not an option
- Dedicated team to handle server maintainance
- When something goes wrong on a server side (server crash) investigating is slow and painful
Learning curve is steep but fair.
Once understood what the tool has been designed for and how much it relies on the target DWH (being a ELT more than a ETL) things get easier.
Moving data from a simple db to the DWH could be achieved in a few days of learning, starting to add some logic or transformation will take some months to master, including handling of variables and how they behave.
You will be tempted to use the python scripting component instead of the builtin components and it can be handy but also keeping you from using the full potential.
Some components, in order to remove the complexity of the task ended up being complex on their own. This layer means that you have to learn how the component works more than the source task challenges.
Once understood what the tool has been designed for and how much it relies on the target DWH (being a ELT more than a ETL) things get easier.
Moving data from a simple db to the DWH could be achieved in a few days of learning, starting to add some logic or transformation will take some months to master, including handling of variables and how they behave.
You will be tempted to use the python scripting component instead of the builtin components and it can be handy but also keeping you from using the full potential.
Some components, in order to remove the complexity of the task ended up being complex on their own. This layer means that you have to learn how the component works more than the source task challenges.
Removes most of the complexity around setting up and preparing things.
If you could describe with words what needs to be done to move data from A to B, the implementation in Matillion would probably be the most similar in terms of simplicity of understanding what you are doing and how.
If you could describe with words what needs to be done to move data from A to B, the implementation in Matillion would probably be the most similar in terms of simplicity of understanding what you are doing and how.
Do you think Matillion delivers good value for the price?
No
Are you happy with Matillion's feature set?
Yes
Did Matillion live up to sales and marketing promises?
I wasn't involved with the selection/purchase process
Did implementation of Matillion go as expected?
No
Would you buy Matillion again?
No