Apache Spark vs. Matillion

Overview
ProductRatingMost Used ByProduct SummaryStarting Price
Apache Spark
Score 8.9 out of 10
N/A
N/AN/A
Matillion
Score 6.0 out of 10
N/A
Matillion is a productivity platform for data teams. Matillion's Data Productivity Cloud helps data teams – coders and non-coders alike – to move, transform, and orchestrate data pipelines with the goal of empowering teams to deliver quality data at a speed and scale that matches the business’s data ambitions. The vendor states enterprises including Cisco,N/A
Pricing
Apache SparkMatillion
Editions & Modules
No answers on this topic
No answers on this topic
Offerings
Pricing Offerings
Apache SparkMatillion
Free Trial
NoYes
Free/Freemium Version
NoNo
Premium Consulting/Integration Services
NoYes
Entry-level Setup FeeNo setup feeNo setup fee
Additional DetailsBilled directly via cloud marketplace on an hourly basis, with annual subscriptions available depending on the customer's cloud data warehouse provider.
More Pricing Information
Community Pulse
Apache SparkMatillion
Considered Both Products
Apache Spark

No answer on this topic

Matillion
Chose Matillion
It is much easier to use in terms of GUI capabilities. The only reason we would use an ETL tool other than our own manually written SQL scripts, is to be able to allow other engineers to use it without having one domain expert stuck on the inner working of complex scripts. So …
Top Pros
Top Cons
Features
Apache SparkMatillion
Data Source Connection
Comparison of Data Source Connection features of Product A and Product B
Apache Spark
-
Ratings
Matillion
6.8
129 Ratings
21% below category average
Connect to traditional data sources00 Ratings7.1128 Ratings
Connecto to Big Data and NoSQL00 Ratings6.688 Ratings
Data Transformations
Comparison of Data Transformations features of Product A and Product B
Apache Spark
-
Ratings
Matillion
5.8
130 Ratings
35% below category average
Simple transformations00 Ratings6.4130 Ratings
Complex transformations00 Ratings5.2129 Ratings
Data Modeling
Comparison of Data Modeling features of Product A and Product B
Apache Spark
-
Ratings
Matillion
6.8
122 Ratings
17% below category average
Data model creation00 Ratings9.133 Ratings
Metadata management00 Ratings9.140 Ratings
Business rules and workflow00 Ratings6.3114 Ratings
Collaboration00 Ratings3.9114 Ratings
Testing and debugging00 Ratings4.5115 Ratings
Data Governance
Comparison of Data Governance features of Product A and Product B
Apache Spark
-
Ratings
Matillion
8.2
23 Ratings
1% below category average
Integration with data quality tools00 Ratings8.222 Ratings
Integration with MDM tools00 Ratings8.220 Ratings
Best Alternatives
Apache SparkMatillion
Small Businesses

No answers on this topic

Skyvia
Skyvia
Score 9.8 out of 10
Medium-sized Companies
Cloudera Manager
Cloudera Manager
Score 9.9 out of 10
IBM InfoSphere Information Server
IBM InfoSphere Information Server
Score 8.0 out of 10
Enterprises
IBM Analytics Engine
IBM Analytics Engine
Score 7.9 out of 10
IBM InfoSphere Information Server
IBM InfoSphere Information Server
Score 8.0 out of 10
All AlternativesView all alternativesView all alternatives
User Ratings
Apache SparkMatillion
Likelihood to Recommend
10.0
(23 ratings)
5.7
(131 ratings)
Likelihood to Renew
10.0
(1 ratings)
8.6
(6 ratings)
Usability
10.0
(3 ratings)
5.0
(130 ratings)
Support Rating
8.7
(4 ratings)
7.4
(7 ratings)
Implementation Rating
-
(0 ratings)
8.2
(1 ratings)
Product Scalability
-
(0 ratings)
4.6
(123 ratings)
Vendor post-sale
-
(0 ratings)
9.1
(1 ratings)
Vendor pre-sale
-
(0 ratings)
9.1
(1 ratings)
User Testimonials
Apache SparkMatillion
Likelihood to Recommend
Apache
Well suited: To most of the local run of datasets and non-prod systems - scalability is not a problem at all. Including data from multiple types of data sources is an added advantage. MLlib is a decently nice built-in library that can be used for most of the ML tasks. Less appropriate: We had to work on a RecSys where the music dataset that we used was around 300+Gb in size. We faced memory-based issues. Few times we also got memory errors. Also the MLlib library does not have support for advanced analytics and deep-learning frameworks support. Understanding the internals of the working of Apache Spark for beginners is highly not possible.
Read full review
Matillion
Matillion does a great job of connecting to different data sources and offering relatively easy connection options. Snowflake is our primary data warehouse and Matillion has made it easy to connect and transform data into any type of data warehouse methodology. It is very easy to schedule and manage jobs along with using an email client to communicate job status. Determine deltas in data is a little more challenging, however it can be resolved with extra coding.
Read full review
Pros
Apache
  • Rich APIs for data transformation making for very each to transform and prepare data in a distributed environment without worrying about memory issues
  • Faster in execution times compare to Hadoop and PIG Latin
  • Easy SQL interface to the same data set for people who are comfortable to explore data in a declarative manner
  • Interoperability between SQL and Scala / Python style of munging data
Read full review
Matillion
  • Matillion has a rich transformation library. It provides multiple functionalities, such as join, group by, pivot, various sources, and sinks.
  • It provides the security capability as well. All the credentials can be securely stored in Matillion.
  • Reusable templates can be built which reduces the redundancy.
  • Time to production is very minimal.
Read full review
Cons
Apache
  • Memory management. Very weak on that.
  • PySpark not as robust as scala with spark.
  • spark master HA is needed. Not as HA as it should be.
  • Locality should not be a necessity, but does help improvement. But would prefer no locality
Read full review
Matillion
  • Static and monolithic, it will show its limits when running multiple concurrent jobs.
  • Github and versioning implementation is messy and broken. Don't use it.
  • There's not way to see/query the system resources, just wait for a server to crash due to out of memory. An admin panel would be appreciated + some env variables with updated info.
  • API implementation is cumbersome and limited.
  • There's no concept of hub and worker engine, everything happens of the same server (designing workflows and executing them). Having separate light ETL engines to run job could be better. (sort of docker/kubernetes/lambda functions).
  • Handling of variables is limited especially for returned values from sub components.
  • Some components could return more metadata at the end of their execution instead of the standard one.
  • Billing is badly designed not taking into account that the server is hosted by the client. Expensive.
  • We had several issue with migration where starting a new instance was required and then migrating the content. It was painful and time consuming also have to deal with support and engineering team on Matillion side.
  • CDC doesn't work as expected or it is not a mature product yet.
Read full review
Likelihood to Renew
Apache
Capacity of computing data in cluster and fast speed.
Read full review
Matillion
With the current experience of Matillion, we are likely to renew with the current feature option but will also look for improvement in various areas including scalability and dependability. 1. Connectors: It offers various connectors option but isn't full proof which we will be looking forward as we grow. 2. Scalability: As usage increase, we want Matillion system to be more stable.
Read full review
Usability
Apache
The only thing I dislike about spark's usability is the learning curve, there are many actions and transformations, however, its wide-range of uses for ETL processing, facility to integrate and it's multi-language support make this library a powerhouse for your data science solutions. It has especially aided us with its lightning-fast processing times.
Read full review
Matillion
Easy tasks are really easy, and complex tasks are still possible. With prior knowledge of general data warehousing principles and experience with other data transformation tools, it's straightforward to get familiar with and use Matillion. I initially used minimal external support from a partner for some more complex tasks but very soon could work entirely independently with Matillion.
Read full review
Support Rating
Apache
1. It integrates very well with scala or python. 2. It's very easy to understand SQL interoperability. 3. Apache is way faster than the other competitive technologies. 4. The support from the Apache community is very huge for Spark. 5. Execution times are faster as compared to others. 6. There are a large number of forums available for Apache Spark. 7. The code availability for Apache Spark is simpler and easy to gain access to. 8. Many organizations use Apache Spark, so many solutions are available for existing applications.
Read full review
Matillion
Overall, I've found Matillion to be responsive and considerate. I feel like they value us as a customer even when I know they have customers who spend more on the product than we do. That speaks to a motive higher than money. They want to make a good product and a good experience for their customers. If I have any complaint, it's that support sometimes feels community-oriented. It isn't always immediately clear to me that my support requests are going to a support engineer and not to the community at large. Usually, though, after a bit of conversation, it's clear that Matillion is watching and responding. And responses are generally quick in coming.
Read full review
Implementation Rating
Apache
No answers on this topic
Matillion
We were able to control on access and built various enviroment for implementation
Read full review
Alternatives Considered
Apache
Spark in comparison to similar technologies ends up being a one stop shop. You can achieve so much with this one framework instead of having to stitch and weave multiple technologies from the Hadoop stack, all while getting incredibility performance, minimal boilerplate, and getting the ability to write your application in the language of your choosing.
Read full review
Matillion
The Matillion selection was not my decision. But I think it's a good enough choice. It is especially valuable that the team can learn Matillion easily and that the project can be understood by the entire team with the visual environment instead of complex ETLs.
Read full review
Scalability
Apache
No answers on this topic
Matillion
Functionality scalability is good (there are many connectors and supported systems out of the box). It's also easy to create a custom component to interact with a system that is not covered by out-of-the-box connectors. From a performance point of view, my experience with scalability is not good (and tied to the Matillion business model): 1. The maximum parallelism of the running jobs depends on the number of cores of the machine where Matillion is deployed. AFAIK it's only possible to deploy Matillion on a single machine (EC2-like). The license price depends on the number of cores that the machine has. 2. The scalability of the UI is pretty bad (random crashes/slowness), and the number of concurrent open sessions is limited by design (again, pricing-related), even if the sessions belong to the same user.
Read full review
Return on Investment
Apache
  • Business leaders are able to take data driven decisions
  • Business users are able access to data in near real time now . Before using spark, they had to wait for at least 24 hours for data to be available
  • Business is able come up with new product ideas
Read full review
Matillion
  • Our embedded data analysts (data analysts that sit in a team outside of the Data team) all now use Matillion to create proof of concepts (POCs). This allows them to debug logic at a component level and quickly explore ideas without investing lots of time and effort.
  • Since the soft-announcement of ‘Data as a product’ (a beta launch) and demoing Matillion to some of our internal customers we’ve had a huge number of requests from people to get their hands on this new method of self serving data. We’ve yet to release the full product and make a company wide announcement, but early estimates show we can expect around 10-15% of the company to be onboarded and using Matillion as part of Data as a Product. Given the Data team only accounts for around 2% for the company's employees, that’s a huge increase in the number of people using and manipulating raw data!
Read full review
ScreenShots

Matillion Screenshots

Screenshot of Matillion's GUI, used to orchestrate jobs with control data flow functionality, automating the ETL process.Screenshot of where structured and semi-structured data can be prepared to create clean data sets that can be used with any BI/reporting/visualization tool of choice. Matillion reads and combines data across a target warehouse external storage, such as S3 or Blob.Screenshot of Matillion's self-validating components, sample and row counts. If a job does fail, the warehouse queue services available with Matillion can be used get an alert to a connected email or Slack account.Screenshot of the SQL component used to run custom scripts from within Matillion. With hundreds of pre-built connectors out of the box, Matillion can handle complex transformation needs.