Matillion is a productivity platform for data teams. Matillion's Data Productivity Cloud helps data teams – coders and non-coders alike – to move, transform, and orchestrate data pipelines with the goal of empowering teams to deliver quality data at a speed and scale that matches the business’s data ambitions. The vendor states enterprises including
N/A
Pricing
Apache Spark
Matillion
Editions & Modules
No answers on this topic
No answers on this topic
Offerings
Pricing Offerings
Apache Spark
Matillion
Free Trial
No
Yes
Free/Freemium Version
No
No
Premium Consulting/Integration Services
No
Yes
Entry-level Setup Fee
No setup fee
No setup fee
Additional Details
—
Billed directly via cloud marketplace on an hourly basis, with annual subscriptions available depending on the customer's cloud data warehouse provider.
It is much easier to use in terms of GUI capabilities. The only reason we would use an ETL tool other than our own manually written SQL scripts, is to be able to allow other engineers to use it without having one domain expert stuck on the inner working of complex scripts. So …
Well suited: To most of the local run of datasets and non-prod systems - scalability is not a problem at all. Including data from multiple types of data sources is an added advantage. MLlib is a decently nice built-in library that can be used for most of the ML tasks. Less appropriate: We had to work on a RecSys where the music dataset that we used was around 300+Gb in size. We faced memory-based issues. Few times we also got memory errors. Also the MLlib library does not have support for advanced analytics and deep-learning frameworks support. Understanding the internals of the working of Apache Spark for beginners is highly not possible.
Matillion is great any time ETL or ELT is needed. I've now used Matillion in 2 different companies and would have no problem recommending others to use it as well. The ease of setting up schedules to just take care of data imports and manipulation is incredible. IT has also been an incredible tool at bringing disparate databases together on a schedule so that I never have to think about it.
I think a non-CLI approach to installing and uninstalling python libraries would be nice. I mean, it isn't difficult to install a python library via Linux or CLI, but I imagine most companies don't feel comfortable allowing Matillion users to go on a virtual server and installing it themselves. Requirement.txt file for installing libraries would be simple, and maybe that could also be used to uninstall libraries as well......or maybe the library gets automatically downloaded if it is imported into a python script but the library doesn't exist.
Python Component is lacking very much in terms of UI. It would be unrealistic for me to suggest Matillion build its own EDI, but it would be nice if a python component could connect to a local IDE. Right now, if you want to write any decent length python code, you are going to be stuck copying and pasting your code from your local IDE.
It is also very difficult to debug in Matillion because you can't set breakpoints. Local IDE integration can resolve that.
I would like to have more templates to copy from for certain simplistic scenarios. For instance, a template for a job that fails which sends an AWS SNS with the Job name, component it failed on, and the error message. It wasn't as simple as I thought it would be to figure out and having to use a shared job for such circumstances can be painful because you have to export a bunch of variables.
There should be a drop down list for Global Matillion variables as it is difficult to remember at times.
With the current experience of Matillion, we are likely to renew with the current feature option but will also look for improvement in various areas including scalability and dependability. 1. Connectors: It offers various connectors option but isn't full proof which we will be looking forward as we grow. 2. Scalability: As usage increase, we want Matillion system to be more stable.
The only thing I dislike about spark's usability is the learning curve, there are many actions and transformations, however, its wide-range of uses for ETL processing, facility to integrate and it's multi-language support make this library a powerhouse for your data science solutions. It has especially aided us with its lightning-fast processing times.
Easy to follow steps in building a data pipeline. We have brought in people with minimal SQL experience and they have been able to quickly and successfully being able to pick up the skill set and contributing in reducing the time to market with various data assets. Matillion does provide a long array of components to select from to perform data transformation. But like any other tool, there are more than one way of doing the same thing. Some are more performant than the others, but easy to follow and easy to tweak to improve.
1. It integrates very well with scala or python. 2. It's very easy to understand SQL interoperability. 3. Apache is way faster than the other competitive technologies. 4. The support from the Apache community is very huge for Spark. 5. Execution times are faster as compared to others. 6. There are a large number of forums available for Apache Spark. 7. The code availability for Apache Spark is simpler and easy to gain access to. 8. Many organizations use Apache Spark, so many solutions are available for existing applications.
Overall, I've found Matillion to be responsive and considerate. I feel like they value us as a customer even when I know they have customers who spend more on the product than we do. That speaks to a motive higher than money. They want to make a good product and a good experience for their customers. If I have any complaint, it's that support sometimes feels community-oriented. It isn't always immediately clear to me that my support requests are going to a support engineer and not to the community at large. Usually, though, after a bit of conversation, it's clear that Matillion is watching and responding. And responses are generally quick in coming.
All the above systems work quite well on big data transformations whereas Spark really shines with its bigger API support and its ability to read from and write to multiple data sources. Using Spark one can easily switch between declarative versus imperative versus functional type programming easily based on the situation. Also it doesn't need special data ingestion or indexing pre-processing like Presto. Combining it with Jupyter Notebooks (https://github.com/jupyter-incubator/sparkmagic), one can develop the Spark code in an interactive manner in Scala or Python
Overall, Matillion is an excellent choice for businesses that need a powerful and easy-to-use data integration and transformation platform that can handle large volumes of data. While it may be a bit pricey for some organizations, the platform's reliability, scalability, and customer support make it a worthwhile investment for many businesses.
Matillion has a nice scalability capacity, and once it has been stored in our AWS cloud it makes it much easier to scale it on demand. The reason I'm not rating it as 10 is that there is no way to migrate or move it to another cloud in an easy way.
Faster turn around on feature development, we have seen a noticeable improvement in our agile development since using Spark.
Easy adoption, having multiple departments use the same underlying technology even if the use cases are very different allows for more commonality amongst applications which definitely makes the operations team happy.
Performance, we have been able to make some applications run over 20x faster since switching to Spark. This has saved us time, headaches, and operating costs.