Likelihood to Recommend Well suited: To most of the local run of datasets and non-prod systems - scalability is not a problem at all. Including data from multiple types of data sources is an added advantage. MLlib is a decently nice built-in library that can be used for most of the ML tasks. Less appropriate: We had to work on a RecSys where the music dataset that we used was around 300+Gb in size. We faced memory-based issues. Few times we also got memory errors. Also the MLlib library does not have support for advanced analytics and deep-learning frameworks support. Understanding the internals of the working of Apache Spark for beginners is highly not possible.
Read full review Matillion does a great job of connecting to different data sources and offering relatively easy connection options. Snowflake is our primary data warehouse and Matillion has made it easy to connect and transform data into any type of data warehouse methodology. It is very easy to schedule and manage jobs along with using an email client to communicate job status. Determine deltas in data is a little more challenging, however it can be resolved with extra coding.
Read full review Pros Rich APIs for data transformation making for very each to transform and prepare data in a distributed environment without worrying about memory issues Faster in execution times compare to Hadoop and PIG Latin Easy SQL interface to the same data set for people who are comfortable to explore data in a declarative manner Interoperability between SQL and Scala / Python style of munging data Read full review Matillion has a rich transformation library. It provides multiple functionalities, such as join, group by, pivot, various sources, and sinks. It provides the security capability as well. All the credentials can be securely stored in Matillion. Reusable templates can be built which reduces the redundancy. Time to production is very minimal. Read full review Cons Memory management. Very weak on that. PySpark not as robust as scala with spark. spark master HA is needed. Not as HA as it should be. Locality should not be a necessity, but does help improvement. But would prefer no locality Read full review Static and monolithic, it will show its limits when running multiple concurrent jobs. Github and versioning implementation is messy and broken. Don't use it. There's not way to see/query the system resources, just wait for a server to crash due to out of memory. An admin panel would be appreciated + some env variables with updated info. API implementation is cumbersome and limited. There's no concept of hub and worker engine, everything happens of the same server (designing workflows and executing them). Having separate light ETL engines to run job could be better. (sort of docker/kubernetes/lambda functions). Handling of variables is limited especially for returned values from sub components. Some components could return more metadata at the end of their execution instead of the standard one. Billing is badly designed not taking into account that the server is hosted by the client. Expensive. We had several issue with migration where starting a new instance was required and then migrating the content. It was painful and time consuming also have to deal with support and engineering team on Matillion side. CDC doesn't work as expected or it is not a mature product yet. Read full review Likelihood to Renew Capacity of computing data in cluster and fast speed.
Steven Li Senior Software Developer (Consultant)
Read full review With the current experience of Matillion, we are likely to renew with the current feature option but will also look for improvement in various areas including scalability and dependability. 1. Connectors: It offers various connectors option but isn't full proof which we will be looking forward as we grow. 2. Scalability: As usage increase, we want Matillion system to be more stable.
Read full review Usability The only thing I dislike about spark's usability is the learning curve, there are many actions and transformations, however, its wide-range of uses for ETL processing, facility to integrate and it's multi-language support make this library a powerhouse for your data science solutions. It has especially aided us with its lightning-fast processing times.
Read full review Easy tasks are really easy, and complex tasks are still possible. With prior knowledge of general data warehousing principles and experience with other data transformation tools, it's straightforward to get familiar with and use Matillion. I initially used minimal external support from a partner for some more complex tasks but very soon could work entirely independently with Matillion.
Read full review Support Rating 1. It integrates very well with scala or python. 2. It's very easy to understand SQL interoperability. 3. Apache is way faster than the other competitive technologies. 4. The support from the Apache community is very huge for Spark. 5. Execution times are faster as compared to others. 6. There are a large number of forums available for Apache Spark. 7. The code availability for Apache Spark is simpler and easy to gain access to. 8. Many organizations use Apache Spark, so many solutions are available for existing applications.
Read full review Overall, I've found Matillion to be responsive and considerate. I feel like they value us as a customer even when I know they have customers who spend more on the product than we do. That speaks to a motive higher than money. They want to make a good product and a good experience for their customers. If I have any complaint, it's that support sometimes feels community-oriented. It isn't always immediately clear to me that my support requests are going to a support engineer and not to the community at large. Usually, though, after a bit of conversation, it's clear that Matillion is watching and responding. And responses are generally quick in coming.
Read full review Implementation Rating We were able to control on access and built various enviroment for implementation
Read full review Alternatives Considered Spark in comparison to similar technologies ends up being a one stop shop. You can achieve so much with this one framework instead of having to stitch and weave multiple technologies from the
Hadoop stack, all while getting incredibility performance, minimal boilerplate, and getting the ability to write your application in the language of your choosing.
Read full review The Matillion selection was not my decision. But I think it's a good enough choice. It is especially valuable that the team can learn Matillion easily and that the project can be understood by the entire team with the visual environment instead of complex ETLs.
Read full review Scalability Functionality scalability is good (there are many connectors and supported systems out of the box). It's also easy to create a custom component to interact with a system that is not covered by out-of-the-box connectors. From a performance point of view, my experience with scalability is not good (and tied to the Matillion business model): 1. The maximum parallelism of the running jobs depends on the number of cores of the machine where Matillion is deployed. AFAIK it's only possible to deploy Matillion on a single machine (EC2-like). The license price depends on the number of cores that the machine has. 2. The scalability of the UI is pretty bad (random crashes/slowness), and the number of concurrent open sessions is limited by design (again, pricing-related), even if the sessions belong to the same user.
Read full review Return on Investment Business leaders are able to take data driven decisions Business users are able access to data in near real time now . Before using spark, they had to wait for at least 24 hours for data to be available Business is able come up with new product ideas Read full review Our embedded data analysts (data analysts that sit in a team outside of the Data team) all now use Matillion to create proof of concepts (POCs). This allows them to debug logic at a component level and quickly explore ideas without investing lots of time and effort. Since the soft-announcement of ‘Data as a product’ (a beta launch) and demoing Matillion to some of our internal customers we’ve had a huge number of requests from people to get their hands on this new method of self serving data. We’ve yet to release the full product and make a company wide announcement, but early estimates show we can expect around 10-15% of the company to be onboarded and using Matillion as part of Data as a Product. Given the Data team only accounts for around 2% for the company's employees, that’s a huge increase in the number of people using and manipulating raw data! Read full review ScreenShots