Apache Spark vs. Matillion

Overview
ProductRatingMost Used ByProduct SummaryStarting Price
Apache Spark
Score 8.6 out of 10
N/A
N/AN/A
Matillion
Score 8.2 out of 10
N/A
Matillion is a productivity platform for data teams. Matillion's Data Productivity Cloud helps data teams – coders and non-coders alike – to move, transform, and orchestrate data pipelines with the goal of empowering teams to deliver quality data at a speed and scale that matches the business’s data ambitions. The vendor states enterprises includingN/A
Pricing
Apache SparkMatillion
Editions & Modules
No answers on this topic
No answers on this topic
Offerings
Pricing Offerings
Apache SparkMatillion
Free Trial
NoYes
Free/Freemium Version
NoNo
Premium Consulting/Integration Services
NoYes
Entry-level Setup FeeNo setup feeNo setup fee
Additional DetailsBilled directly via cloud marketplace on an hourly basis, with annual subscriptions available depending on the customer's cloud data warehouse provider.
More Pricing Information
Community Pulse
Apache SparkMatillion
Considered Both Products
Apache Spark

No answer on this topic

Matillion
Chose Matillion
It is much easier to use in terms of GUI capabilities. The only reason we would use an ETL tool other than our own manually written SQL scripts, is to be able to allow other engineers to use it without having one domain expert stuck on the inner working of complex scripts. So …
Top Pros
Top Cons
Features
Apache SparkMatillion
Data Source Connection
Comparison of Data Source Connection features of Product A and Product B
Apache Spark
-
Ratings
Matillion
8.6
109 Ratings
3% above category average
Connect to traditional data sources00 Ratings8.8108 Ratings
Connecto to Big Data and NoSQL00 Ratings8.472 Ratings
Data Transformations
Comparison of Data Transformations features of Product A and Product B
Apache Spark
-
Ratings
Matillion
8.3
109 Ratings
1% below category average
Simple transformations00 Ratings9.2109 Ratings
Complex transformations00 Ratings7.5108 Ratings
Data Modeling
Comparison of Data Modeling features of Product A and Product B
Apache Spark
-
Ratings
Matillion
8.4
104 Ratings
4% above category average
Data model creation00 Ratings9.133 Ratings
Metadata management00 Ratings9.140 Ratings
Business rules and workflow00 Ratings8.296 Ratings
Collaboration00 Ratings7.797 Ratings
Testing and debugging00 Ratings7.897 Ratings
Data Governance
Comparison of Data Governance features of Product A and Product B
Apache Spark
-
Ratings
Matillion
8.2
23 Ratings
0% below category average
Integration with data quality tools00 Ratings8.222 Ratings
Integration with MDM tools00 Ratings8.220 Ratings
Best Alternatives
Apache SparkMatillion
Small Businesses

No answers on this topic

Skyvia
Skyvia
Score 9.5 out of 10
Medium-sized Companies
Cloudera Manager
Cloudera Manager
Score 9.7 out of 10
InfoSphere
InfoSphere
Score 10.0 out of 10
Enterprises
IBM Analytics Engine
IBM Analytics Engine
Score 9.3 out of 10
InfoSphere
InfoSphere
Score 10.0 out of 10
All AlternativesView all alternativesView all alternatives
User Ratings
Apache SparkMatillion
Likelihood to Recommend
9.7
(24 ratings)
9.2
(110 ratings)
Likelihood to Renew
10.0
(1 ratings)
6.9
(6 ratings)
Usability
10.0
(3 ratings)
8.7
(109 ratings)
Support Rating
8.6
(6 ratings)
7.4
(9 ratings)
Implementation Rating
-
(0 ratings)
8.2
(1 ratings)
Product Scalability
-
(0 ratings)
8.2
(102 ratings)
Vendor post-sale
-
(0 ratings)
9.1
(1 ratings)
Vendor pre-sale
-
(0 ratings)
9.1
(1 ratings)
User Testimonials
Apache SparkMatillion
Likelihood to Recommend
Apache
Well suited: To most of the local run of datasets and non-prod systems - scalability is not a problem at all. Including data from multiple types of data sources is an added advantage. MLlib is a decently nice built-in library that can be used for most of the ML tasks. Less appropriate: We had to work on a RecSys where the music dataset that we used was around 300+Gb in size. We faced memory-based issues. Few times we also got memory errors. Also the MLlib library does not have support for advanced analytics and deep-learning frameworks support. Understanding the internals of the working of Apache Spark for beginners is highly not possible.
Read full review
Matillion
Matillion is great any time ETL or ELT is needed. I've now used Matillion in 2 different companies and would have no problem recommending others to use it as well. The ease of setting up schedules to just take care of data imports and manipulation is incredible. IT has also been an incredible tool at bringing disparate databases together on a schedule so that I never have to think about it.
Read full review
Pros
Apache
  • Apache Spark makes processing very large data sets possible. It handles these data sets in a fairly quick manner.
  • Apache Spark does a fairly good job implementing machine learning models for larger data sets.
  • Apache Spark seems to be a rapidly advancing software, with the new features making the software ever more straight-forward to use.
Read full review
Matillion
  • Excellent tool to do ELT data pipeline using Snowflake
  • Easy graphical orchestration to enable complex dependencies
  • Easy manage users and group permissions
  • Easy to schedule jobs using time dependency and job dependency
  • Provides many API's to fetch data from multiple vendors
Read full review
Cons
Apache
  • Memory management. Very weak on that.
  • PySpark not as robust as scala with spark.
  • spark master HA is needed. Not as HA as it should be.
  • Locality should not be a necessity, but does help improvement. But would prefer no locality
Read full review
Matillion
  • I think a non-CLI approach to installing and uninstalling python libraries would be nice. I mean, it isn't difficult to install a python library via Linux or CLI, but I imagine most companies don't feel comfortable allowing Matillion users to go on a virtual server and installing it themselves. Requirement.txt file for installing libraries would be simple, and maybe that could also be used to uninstall libraries as well......or maybe the library gets automatically downloaded if it is imported into a python script but the library doesn't exist.
  • Python Component is lacking very much in terms of UI. It would be unrealistic for me to suggest Matillion build its own EDI, but it would be nice if a python component could connect to a local IDE. Right now, if you want to write any decent length python code, you are going to be stuck copying and pasting your code from your local IDE.
  • It is also very difficult to debug in Matillion because you can't set breakpoints. Local IDE integration can resolve that.
  • I would like to have more templates to copy from for certain simplistic scenarios. For instance, a template for a job that fails which sends an AWS SNS with the Job name, component it failed on, and the error message. It wasn't as simple as I thought it would be to figure out and having to use a shared job for such circumstances can be painful because you have to export a bunch of variables.
  • There should be a drop down list for Global Matillion variables as it is difficult to remember at times.
Read full review
Likelihood to Renew
Apache
Capacity of computing data in cluster and fast speed.
Read full review
Matillion
With the current experience of Matillion, we are likely to renew with the current feature option but will also look for improvement in various areas including scalability and dependability. 1. Connectors: It offers various connectors option but isn't full proof which we will be looking forward as we grow. 2. Scalability: As usage increase, we want Matillion system to be more stable.
Read full review
Usability
Apache
The only thing I dislike about spark's usability is the learning curve, there are many actions and transformations, however, its wide-range of uses for ETL processing, facility to integrate and it's multi-language support make this library a powerhouse for your data science solutions. It has especially aided us with its lightning-fast processing times.
Read full review
Matillion
Easy to follow steps in building a data pipeline. We have brought in people with minimal SQL experience and they have been able to quickly and successfully being able to pick up the skill set and contributing in reducing the time to market with various data assets. Matillion does provide a long array of components to select from to perform data transformation. But like any other tool, there are more than one way of doing the same thing. Some are more performant than the others, but easy to follow and easy to tweak to improve.
Read full review
Support Rating
Apache
1. It integrates very well with scala or python. 2. It's very easy to understand SQL interoperability. 3. Apache is way faster than the other competitive technologies. 4. The support from the Apache community is very huge for Spark. 5. Execution times are faster as compared to others. 6. There are a large number of forums available for Apache Spark. 7. The code availability for Apache Spark is simpler and easy to gain access to. 8. Many organizations use Apache Spark, so many solutions are available for existing applications.
Read full review
Matillion
Overall, I've found Matillion to be responsive and considerate. I feel like they value us as a customer even when I know they have customers who spend more on the product than we do. That speaks to a motive higher than money. They want to make a good product and a good experience for their customers. If I have any complaint, it's that support sometimes feels community-oriented. It isn't always immediately clear to me that my support requests are going to a support engineer and not to the community at large. Usually, though, after a bit of conversation, it's clear that Matillion is watching and responding. And responses are generally quick in coming.
Read full review
Implementation Rating
Apache
No answers on this topic
Matillion
We were able to control on access and built various enviroment for implementation
Read full review
Alternatives Considered
Apache
All the above systems work quite well on big data transformations whereas Spark really shines with its bigger API support and its ability to read from and write to multiple data sources. Using Spark one can easily switch between declarative versus imperative versus functional type programming easily based on the situation. Also it doesn't need special data ingestion or indexing pre-processing like Presto. Combining it with Jupyter Notebooks (https://github.com/jupyter-incubator/sparkmagic), one can develop the Spark code in an interactive manner in Scala or Python
Read full review
Matillion
Overall, Matillion is an excellent choice for businesses that need a powerful and easy-to-use data integration and transformation platform that can handle large volumes of data. While it may be a bit pricey for some organizations, the platform's reliability, scalability, and customer support make it a worthwhile investment for many businesses.
Read full review
Scalability
Apache
No answers on this topic
Matillion
Matillion has a nice scalability capacity, and once it has been stored in our AWS cloud it makes it much easier to scale it on demand. The reason I'm not rating it as 10 is that there is no way to migrate or move it to another cloud in an easy way.
Read full review
Return on Investment
Apache
  • Faster turn around on feature development, we have seen a noticeable improvement in our agile development since using Spark.
  • Easy adoption, having multiple departments use the same underlying technology even if the use cases are very different allows for more commonality amongst applications which definitely makes the operations team happy.
  • Performance, we have been able to make some applications run over 20x faster since switching to Spark. This has saved us time, headaches, and operating costs.
Read full review
Matillion
  • Saving us time reduces our need for headcount
  • Allows us to collaborate on data-eng pipelines in a transparent way with non-technical stakeholders, ensuring accuracy and continuity
  • Allows us a single platform to log and manage all of our pipelines to pin-point where something failed and why
  • Always has a solution to very common data engineering tasks, i.e., real-time data, connectors, pre-built workflows, etc.
Read full review
ScreenShots

Matillion Screenshots

Screenshot of Matillion's GUI, used to orchestrate jobs with control data flow functionality, automating the ETL process.Screenshot of where structured and semi-structured data can be prepared to create clean data sets that can be used with any BI/reporting/visualization tool of choice. Matillion reads and combines data across a target warehouse external storage, such as S3 or Blob.Screenshot of Matillion's self-validating components, sample and row counts. If a job does fail, the warehouse queue services available with Matillion can be used get an alert to a connected email or Slack account.Screenshot of the SQL component used to run custom scripts from within Matillion. With hundreds of pre-built connectors out of the box, Matillion can handle complex transformation needs.