Apache Spark vs. Matillion

Overview
ProductRatingMost Used ByProduct SummaryStarting Price
Apache Spark
Score 8.6 out of 10
N/A
N/AN/A
Matillion
Score 6.7 out of 10
N/A
Matillion is a productivity platform for data teams. Matillion's Data Productivity Cloud helps data teams – coders and non-coders alike – to move, transform, and orchestrate data pipelines with the goal of empowering teams to deliver quality data at a speed and scale that matches the business’s data ambitions. The vendor states enterprises including Cisco,N/A
Pricing
Apache SparkMatillion
Editions & Modules
No answers on this topic
No answers on this topic
Offerings
Pricing Offerings
Apache SparkMatillion
Free Trial
NoYes
Free/Freemium Version
NoNo
Premium Consulting/Integration Services
NoYes
Entry-level Setup FeeNo setup feeNo setup fee
Additional DetailsBilled directly via cloud marketplace on an hourly basis, with annual subscriptions available depending on the customer's cloud data warehouse provider.
More Pricing Information
Community Pulse
Apache SparkMatillion
Considered Both Products
Apache Spark

No answer on this topic

Matillion
Chose Matillion
It is much easier to use in terms of GUI capabilities. The only reason we would use an ETL tool other than our own manually written SQL scripts, is to be able to allow other engineers to use it without having one domain expert stuck on the inner working of complex scripts. So …
Top Pros
Top Cons
Features
Apache SparkMatillion
Data Source Connection
Comparison of Data Source Connection features of Product A and Product B
Apache Spark
-
Ratings
Matillion
7.6
123 Ratings
8% below category average
Connect to traditional data sources00 Ratings7.7122 Ratings
Connecto to Big Data and NoSQL00 Ratings7.582 Ratings
Data Transformations
Comparison of Data Transformations features of Product A and Product B
Apache Spark
-
Ratings
Matillion
7.3
124 Ratings
14% below category average
Simple transformations00 Ratings8.1124 Ratings
Complex transformations00 Ratings6.4123 Ratings
Data Modeling
Comparison of Data Modeling features of Product A and Product B
Apache Spark
-
Ratings
Matillion
7.2
116 Ratings
12% below category average
Data model creation00 Ratings9.133 Ratings
Metadata management00 Ratings9.140 Ratings
Business rules and workflow00 Ratings5.9108 Ratings
Collaboration00 Ratings5.4108 Ratings
Testing and debugging00 Ratings5.5109 Ratings
Data Governance
Comparison of Data Governance features of Product A and Product B
Apache Spark
-
Ratings
Matillion
8.2
23 Ratings
0% below category average
Integration with data quality tools00 Ratings8.222 Ratings
Integration with MDM tools00 Ratings8.220 Ratings
Best Alternatives
Apache SparkMatillion
Small Businesses

No answers on this topic

Skyvia
Skyvia
Score 9.6 out of 10
Medium-sized Companies
Cloudera Manager
Cloudera Manager
Score 9.7 out of 10
IBM InfoSphere Information Server
IBM InfoSphere Information Server
Score 8.1 out of 10
Enterprises
IBM Analytics Engine
IBM Analytics Engine
Score 8.8 out of 10
IBM InfoSphere Information Server
IBM InfoSphere Information Server
Score 8.1 out of 10
All AlternativesView all alternativesView all alternatives
User Ratings
Apache SparkMatillion
Likelihood to Recommend
9.9
(24 ratings)
6.9
(125 ratings)
Likelihood to Renew
10.0
(1 ratings)
8.2
(6 ratings)
Usability
10.0
(3 ratings)
6.5
(124 ratings)
Support Rating
8.7
(4 ratings)
7.4
(7 ratings)
Implementation Rating
-
(0 ratings)
8.2
(1 ratings)
Product Scalability
-
(0 ratings)
6.6
(117 ratings)
Vendor post-sale
-
(0 ratings)
9.1
(1 ratings)
Vendor pre-sale
-
(0 ratings)
9.1
(1 ratings)
User Testimonials
Apache SparkMatillion
Likelihood to Recommend
Apache
Well suited: To most of the local run of datasets and non-prod systems - scalability is not a problem at all. Including data from multiple types of data sources is an added advantage. MLlib is a decently nice built-in library that can be used for most of the ML tasks. Less appropriate: We had to work on a RecSys where the music dataset that we used was around 300+Gb in size. We faced memory-based issues. Few times we also got memory errors. Also the MLlib library does not have support for advanced analytics and deep-learning frameworks support. Understanding the internals of the working of Apache Spark for beginners is highly not possible.
Read full review
Matillion
It is very well suited for ETL on the cloud. Whenever there is something that can be accomplished with no code or little code, Matillion is a good tool. However, if your pipeline requires a lot of customizations, Matillion should be avoided.
Read full review
Pros
Apache
  • Apache Spark makes processing very large data sets possible. It handles these data sets in a fairly quick manner.
  • Apache Spark does a fairly good job implementing machine learning models for larger data sets.
  • Apache Spark seems to be a rapidly advancing software, with the new features making the software ever more straight-forward to use.
Read full review
Matillion
  • We leveraged Matillion’s no-code principals to make data manipulation easy for our internal customers. People who don't know how to use SQL no longer need to. Everything in Matillion is self-explained with no or little coding.
  • We connected Matillion to our data warehouse to allow people to read raw data, transform it, then write results back to their sandbox databases. The drag and drop component design allowed customers to create complex data models at the speed of thought without any risk to production data.
  • With sharing capabilities between projects enabled, everyone was able to help each other when questions arose which instilled a strong sense of collaboration and community.
Read full review
Cons
Apache
  • Memory management. Very weak on that.
  • PySpark not as robust as scala with spark.
  • spark master HA is needed. Not as HA as it should be.
  • Locality should not be a necessity, but does help improvement. But would prefer no locality
Read full review
Matillion
  • I think a non-CLI approach to installing and uninstalling python libraries would be nice. I mean, it isn't difficult to install a python library via Linux or CLI, but I imagine most companies don't feel comfortable allowing Matillion users to go on a virtual server and installing it themselves. Requirement.txt file for installing libraries would be simple, and maybe that could also be used to uninstall libraries as well......or maybe the library gets automatically downloaded if it is imported into a python script but the library doesn't exist.
  • Python Component is lacking very much in terms of UI. It would be unrealistic for me to suggest Matillion build its own EDI, but it would be nice if a python component could connect to a local IDE. Right now, if you want to write any decent length python code, you are going to be stuck copying and pasting your code from your local IDE.
  • It is also very difficult to debug in Matillion because you can't set breakpoints. Local IDE integration can resolve that.
  • I would like to have more templates to copy from for certain simplistic scenarios. For instance, a template for a job that fails which sends an AWS SNS with the Job name, component it failed on, and the error message. It wasn't as simple as I thought it would be to figure out and having to use a shared job for such circumstances can be painful because you have to export a bunch of variables.
  • There should be a drop down list for Global Matillion variables as it is difficult to remember at times.
Read full review
Likelihood to Renew
Apache
Capacity of computing data in cluster and fast speed.
Read full review
Matillion
With the current experience of Matillion, we are likely to renew with the current feature option but will also look for improvement in various areas including scalability and dependability. 1. Connectors: It offers various connectors option but isn't full proof which we will be looking forward as we grow. 2. Scalability: As usage increase, we want Matillion system to be more stable.
Read full review
Usability
Apache
The only thing I dislike about spark's usability is the learning curve, there are many actions and transformations, however, its wide-range of uses for ETL processing, facility to integrate and it's multi-language support make this library a powerhouse for your data science solutions. It has especially aided us with its lightning-fast processing times.
Read full review
Matillion
It has been easy to train new employees who don't have previous experience with Matillion. It is quite self-explanatory. There are quite a few things that can be done with Python, however, we have not really looked into this feature much but likely will do in the future. Mostly, it is drag and drop of components and environments can be set up so easy to set up connections as well.
Read full review
Support Rating
Apache
1. It integrates very well with scala or python. 2. It's very easy to understand SQL interoperability. 3. Apache is way faster than the other competitive technologies. 4. The support from the Apache community is very huge for Spark. 5. Execution times are faster as compared to others. 6. There are a large number of forums available for Apache Spark. 7. The code availability for Apache Spark is simpler and easy to gain access to. 8. Many organizations use Apache Spark, so many solutions are available for existing applications.
Read full review
Matillion
Overall, I've found Matillion to be responsive and considerate. I feel like they value us as a customer even when I know they have customers who spend more on the product than we do. That speaks to a motive higher than money. They want to make a good product and a good experience for their customers. If I have any complaint, it's that support sometimes feels community-oriented. It isn't always immediately clear to me that my support requests are going to a support engineer and not to the community at large. Usually, though, after a bit of conversation, it's clear that Matillion is watching and responding. And responses are generally quick in coming.
Read full review
Implementation Rating
Apache
No answers on this topic
Matillion
We were able to control on access and built various enviroment for implementation
Read full review
Alternatives Considered
Apache
All the above systems work quite well on big data transformations whereas Spark really shines with its bigger API support and its ability to read from and write to multiple data sources. Using Spark one can easily switch between declarative versus imperative versus functional type programming easily based on the situation. Also it doesn't need special data ingestion or indexing pre-processing like Presto. Combining it with Jupyter Notebooks (https://github.com/jupyter-incubator/sparkmagic), one can develop the Spark code in an interactive manner in Scala or Python
Read full review
Matillion
Matillion ran circles around Stitch and Striim both in functionality, setup, and performance. There was no real comparison. Fivetran massively outperforms Matillion in pretty much every facet of the production from setup, maintenance, visibility, and usability. It already has the ability to connect any data source to a destination regardless of database type. Why we chose Matillion over Fivetran is that, for our current needs, Matillion provides us with the functionality that we need and a much more competitive price for a smaller company.
Read full review
Scalability
Apache
No answers on this topic
Matillion
I have been able to connect Matillion to AWS Aurora Databases, MySQL databases, Rest APIs, Files in AWS S3, etc. Being able to load all of that disparate data into one datalake has made data mining and reporting a lot simpler. I wish everything could be implemented as easily as Matillion.
Read full review
Return on Investment
Apache
  • Faster turn around on feature development, we have seen a noticeable improvement in our agile development since using Spark.
  • Easy adoption, having multiple departments use the same underlying technology even if the use cases are very different allows for more commonality amongst applications which definitely makes the operations team happy.
  • Performance, we have been able to make some applications run over 20x faster since switching to Spark. This has saved us time, headaches, and operating costs.
Read full review
Matillion
  • Saving us time reduces our need for headcount
  • Allows us to collaborate on data-eng pipelines in a transparent way with non-technical stakeholders, ensuring accuracy and continuity
  • Allows us a single platform to log and manage all of our pipelines to pin-point where something failed and why
  • Always has a solution to very common data engineering tasks, i.e., real-time data, connectors, pre-built workflows, etc.
Read full review
ScreenShots

Matillion Screenshots

Screenshot of Matillion's GUI, used to orchestrate jobs with control data flow functionality, automating the ETL process.Screenshot of where structured and semi-structured data can be prepared to create clean data sets that can be used with any BI/reporting/visualization tool of choice. Matillion reads and combines data across a target warehouse external storage, such as S3 or Blob.Screenshot of Matillion's self-validating components, sample and row counts. If a job does fail, the warehouse queue services available with Matillion can be used get an alert to a connected email or Slack account.Screenshot of the SQL component used to run custom scripts from within Matillion. With hundreds of pre-built connectors out of the box, Matillion can handle complex transformation needs.