Apache Spark

Apache Spark

About TrustRadius Scoring
Score 8.7 out of 100
Apache Spark

Overview

Recent Reviews

Apache Spark in Telco

10 out of 10
July 22, 2021
Apache Spark is being widely used within the company. In Advanced Analytics department data engineers and data scientists work closely in …
Continue reading

A powerhouse processing engine.

9 out of 10
September 19, 2020
We use Apache Spark for cluster computing in large-scale data processing, ETL functions, machine learning, as well as for analytics. Its …
Continue reading

Apache Spark Review

7 out of 10
March 16, 2019
We used Apache Spark within our department as a Solution Architecture team. It helped make big data processing more efficient since the …
Continue reading

Reviewer Pros & Cons

View all pros & cons

Video Reviews

Leaving a video review helps other professionals like you evaluate products. Be the first one in your network to record a review of Apache Spark, and make your voice heard!

Pricing

View all pricing
N/A
Unavailable

Sorry, this product's description is unavailable

Entry-level set up fee?

  • No setup fee

Offerings

  • Free Trial
  • Free/Freemium Version
  • Premium Consulting / Integration Services

Would you like us to let the vendor know that you want pricing?

5 people want pricing too

Alternatives Pricing

What is Databricks Lakehouse Platform?

Databricks in San Francisco offers the Databricks Lakehouse Platform (formerly the Unified Analytics Platform), a data science platform and Apache Spark cluster manager. The Databricks Unified Data Service aims to provide a reliable and scalable platform for data pipelines, data lakes, and data…

Features Scorecard

No scorecards have been submitted for this product yet..

Product Details

What is Apache Spark?

Apache Spark Technical Details

Operating SystemsUnspecified
Mobile ApplicationNo

Comparisons

View all alternatives

Reviews and Ratings

 (147)

Ratings

Reviews

(1-19 of 19)
Companies can't remove reviews or game the system. Here's why
Thomas Young | TrustRadius Reviewer
Score 9 out of 10
Vetted Review
Verified User
Review Source
How does Apache Spark perform against competing tools? I think Apache Spark does well in processing large volumes of data. The machine learning models also seem to be easier to program and interpret. With that said, the programming side of Apache Spark seems more difficult to implement good models than Kinesis or other tools. You really have to have lots of data and very valuable questions to answer to justify the investment in Apache Spark.
Score 9 out of 10
Vetted Review
Verified User
Review Source
There are a few alternatives that can do the same transformation and aggregation like Apache Spark can do but most of them are not able to perform parallel computation. For example, pandas is a really good tool to do that but not parallelized; However, there are some tools that leverage pandas interface and syntax with dask and ray on the backend.
Surendranatha Reddy Chappidi | TrustRadius Reviewer
Score 9 out of 10
Vetted Review
Verified User
Review Source
  • Apache Spark works in distributed mode using cluster
  • Informatica and Datastage cannot scale horizontally
  • We can write custom code in spark, whereas in Datastage and Informatica we can only choose the different features proivided already.
  • Apache Spark is open-sourced and free, whereas we need to buy license for Datastage and Informatica
Score 8 out of 10
Vetted Review
Verified User
Review Source
1. Apache Spark is almost 100 % faster than Hadoop.
2. Apache Spark is more stable than Amazon EMR.
3. The end to end distributed machine library is more robust in Apache Spark.
4. For very large data sets, Apache Spark is more trustworthy than the other two.
5. For data transformations, Apache Spark provides a very rich set of APIs.
6. The interface provided for SQL in Apache Spark is easy to understand as compared to others.
Score 9 out of 10
Vetted Review
Verified User
Review Source
Databricks uses Spark as a foundation, and is also a great platform. It does bring several add-ons, which we did not feel needed by the time we evaluated - and haven't needed since then. One interesting plus in our opinion was the engineering support, which is great depending on the criticality of your platform.
March 16, 2019

Apache Spark Review

Score 7 out of 10
Vetted Review
Verified User
Review Source
It is easy to learn, read and to maintain. It brings the best of the Ruby on Rails framework from Java that helps to create a web service so easily. Communication is one of the most distinctive features of Apache Spark compared to alternative products. You are able to communicate with your colleague in your team who also uses Spark while you are on the phone.
Score 9 out of 10
Vetted Review
Verified User
Review Source
We evaluated SAS alongside with Apache Spark but during the course of proof of concept found that Apache Spark was able to support the hadoop eco-system and hadoop file system much better. It was much faster at that time while having the ability to process data quickly for the business analytical needs and and also scaled up well.
Carla Borges | TrustRadius Reviewer
Score 10 out of 10
Vetted Review
Verified User
Review Source
I prefer Apache Spark compared to Hadoop, since in my experience Spark has more usability and comes equipped with simple APIs for Scala, Python, Java and Spark SQL, as well as provides feedback in REPL format on the commands. At the same time, Apache Spark seems to have the best performance in the processing of large data that works in memory and, therefore, more processes can be downloaded on Spark than on Hadoop, despite the fact that Hadoop is also a very useful tool.
Nitin Pasumarthy | TrustRadius Reviewer
Score 10 out of 10
Vetted Review
Verified User
Review Source
All the above systems work quite well on big data transformations whereas Spark really shines with its bigger API support and its ability to read from and write to multiple data sources. Using Spark one can easily switch between declarative versus imperative versus functional type programming easily based on the situation. Also it doesn't need special data ingestion or indexing pre-processing like Presto. Combining it with Jupyter Notebooks (https://github.com/jupyter-incubator/sparkmagic), one can develop the Spark code in an interactive manner in Scala or Python.
Kartik Chavan | TrustRadius Reviewer
Score 9 out of 10
Vetted Review
Verified User
Review Source
Even with Python, MapReduce is lengthy coding. Combination of Python with Apache Spark will not only shorten the code, but it will effectively increase the speed of algorithms. Occasionally, I use MapReduce, but Apache Spark will replace MapReduce very soon. It has many built-in and faster features.
Anson Abraham | TrustRadius Reviewer
Score 9 out of 10
Vetted Review
Verified User
Review Source
  • mapreduce and apache storm
vs MapRedce, it was faster and easier to manage. Especially for Machine Learning, where MapReduce is lacking. Also Apache Storm was slower and didn't scale as much as Spark does. Spark elasticity was easier to apply compared to storm and MapReduce.
managing resources for Spark was easier compared to storm as well. MapReduce is slower than spark.
Score 9 out of 10
Vetted Review
Verified User
Review Source
Spark in comparison to similar technologies ends up being a one stop shop. You can achieve so much with this one framework instead of having to stitch and weave multiple technologies from the Hadoop stack, all while getting incredibility performance, minimal boilerplate, and getting the ability to write your application in the language of your choosing.
Kamesh Emani | TrustRadius Reviewer
Score 10 out of 10
Vetted Review
Verified User
Review Source
Apache Pig and Apache Hive provide most of the things spark provide but apache spark has more features like actions and transformations which are easy to code. Spark uses optimization technique as we can select driver program and manipulate DAG (Directed Acyclic Graph)
Python can be used even for data transformations but it requires lot of coding compared to Spark and it is even so slow.
Score 10 out of 10
Vetted Review
Verified User
Review Source
There are a few newer frameworks for general processing like Flink, Beam, frameworks for streaming like Samza and Storm, and traditional Map-Reduce. I think Spark is at a sweet spot where its clearly better than Map-Reduce for many workflows yet has gotten a good amount of support in the community that there is little risk in deploying it. It also integrates batch and streaming workflows and APIs, allowing an all in package for multiple use-cases.
Jordan Moore | TrustRadius Reviewer
Score 8 out of 10
Vetted Review
Verified User
Review Source
Spark has primarily replaced my use of writing pure Hadoop MapReduce or Apache Pig jobs for processing data. I like the fact that I can alternate between the main programming languages that I know - Java and Python - and use those to learn the Scala API. Spark also can be installed individually on any computer, and one can quickly get started writing applications using just the Spark Shell. I also enjoy the features that you can easily add community built packages into a Spark application such as connectors to different database sources or have various data processing libraries that aren't included in the programming language that is used.