Skip to main content
TrustRadius
Apache Spark

Apache Spark

Overview

Recent Reviews

TrustRadius Insights

Apache Spark is an incredibly versatile tool that has been widely adopted across various departments for processing very large datasets …
Continue reading

Apache Spark in Telco

10 out of 10
July 22, 2021
Incentivized
Apache Spark is being widely used within the company. In Advanced Analytics department data engineers and data scientists work closely in …
Continue reading

Apache Spark Review

7 out of 10
March 16, 2019
Incentivized
We used Apache Spark within our department as a Solution Architecture team. It helped make big data processing more efficient since the …
Continue reading
Read all reviews

Reviewer Pros & Cons

View all pros & cons
Return to navigation

Product Demos

Spark Project | Spark Tutorial | Online Spark Training | Intellipaat

YouTube

Spark SQL Tutorial | Spark SQL Using Scala | Apache Spark Tutorial For Beginners | Simplilearn

YouTube

Apache Spark Full Course | Apache Spark Tutorial For Beginners | Learn Spark In 7 Hours |Simplilearn

YouTube

Apache Spark Architecture | Spark Cluster Architecture Explained | Spark Training | Edureka

YouTube

Introduction to Databricks [New demo linked in description]

YouTube

Apache Spark Tutorial | Spark Tutorial for Beginners | Spark Big Data | Intellipaat

YouTube
Return to navigation

Product Details

What is Apache Spark?

Apache Spark Technical Details

Operating SystemsUnspecified
Mobile ApplicationNo
Return to navigation

Comparisons

View all alternatives
Return to navigation

Reviews and Ratings

(159)

Community Insights

TrustRadius Insights are summaries of user sentiment data from TrustRadius reviews and, when necessary, 3rd-party data sources. Have feedback on this content? Let us know!

Apache Spark is an incredibly versatile tool that has been widely adopted across various departments for processing very large datasets and generating summary statistics. Users have found it particularly useful for creating simple graphics when working with big data, making it a valuable asset for analytics departments. It is also used extensively in the banking industry to calculate risk-weighted assets on a daily and monthly basis for different positions. The integration of Apache Spark with Scala and Apache Spark clusters enables users to load and process large volumes of data, implementing complex formulas and algorithms. Additionally, Apache Spark is often utilized alongside Kafka and Spark Streams to extract data from Kafka queues into HDFS environments, allowing for streamlined data analysis and processing.

One of the key strengths of Apache Spark lies in its ability to handle large volumes of retail and eCommerce data, providing cost and performance benefits over traditional RDBMS solutions. This makes it a preferred choice for companies in these industries. Furthermore, Apache Spark plays a crucial role in supporting data-driven decision-making by digital data teams. Its capabilities allow these teams to build data products, source data from different systems, process and transform it, and store it in data lakes.

Apache Spark is highly regarded for its ability to perform data cleansing and transformation before inserting it into the final target layer in data warehouses. This makes it a vital tool for ensuring the accuracy and reliability of data. Its faster data processing capabilities compared to Hadoop MapReduce have made Apache Spark a go-to choice for tasks such as machine learning, analytics, batch processing, data ingestion, and report development. Moreover, educational institutions rely on Apache Spark to optimize scheduling by assigning classrooms based on student course enrollment and professor schedules.

Overall, Apache Spark proves itself as an indispensable product that meets the needs of various industries by offering efficient distributed data processing, advanced analytics capabilities, and seamless integration with other technologies. Its versatility allows it to support a wide range of use cases, making it an essential tool for anyone working with big data.

Great Computing Engine: Apache Spark is praised by many users for its capabilities in handling complex transformative logic and sophisticated data processing tasks. Several reviewers have mentioned that it is a great computing engine, indicating its effectiveness in solving intricate problems.

Valuable Insights and Analysis: Many reviewers find Apache Spark to be useful for understanding data and performing data analytical work. They appreciate the valuable insights and analysis capabilities provided by the software, suggesting that it helps them gain deeper understanding of their data.

Extensive Set of Libraries and APIs: The extensive set of libraries and APIs offered by Apache Spark has been highly appreciated by users. It provides a wide range of tools and functionalities to solve various day-to-day problems, making it a versatile choice for different data processing needs.

Challenging to Understand and Use: Some users have found Apache Spark to be challenging to understand and use for modeling big data. They struggle with the complexity of the software, leading to a high learning curve.

Lack of User-Friendliness: The software is considered not user-friendly, with a confusing user interface and graphics that are not of high quality. This has resulted in frustration among some users who find it difficult to navigate and work with.

Time-Consuming Processing: Apache Spark can be time-consuming when processing large data sets across multiple nodes. This has been reported by several users who have experienced delays in their data processing tasks, affecting overall efficiency.

When using Spark for big data tasks, users commonly recommend familiarizing yourself with the documentation and gaining experience. They emphasize investing time in reading and understanding the documentation to overcome any initial challenges. As users gain experience, they find working with Spark becomes easier and more efficient.

Users also suggest utilizing Spark specifically for true big data problems, where its capabilities and performance shine. They highlight that Spark is well-suited for tackling large-scale data processing tasks.

Additionally, users find value in leveraging the R and Python APIs in Spark. These APIs allow them to work with Spark using familiar programming languages such as R and Python, making it easier to analyze and process data.

Overall, users advise diving into the documentation, utilizing Spark for big data challenges, and leveraging the R and Python APIs to enhance their experience with Spark.

Attribute Ratings

Reviews

(1-21 of 21)
Companies can't remove reviews or game the system. Here's why
Ananth Gouri | TrustRadius Reviewer
Score 10 out of 10
Vetted Review
Verified User
Incentivized
We used Surprise Kit for one of the other research works. It is more fine-tuned to Recommendation systems and their algorithms. Apache Spark has MLlib for majority of ML problems. Where as software like Surprse Kit - it suitable for a specific task of Recommendations only.
Riyaz Khan | TrustRadius Reviewer
Score 10 out of 10
Vetted Review
Verified User
Apache Spark is a fast-processing in-memory computing framework. It is 10 times faster than Apache Hadoop. Earlier we were using Apache Hadoop for processing data on the disk but now we are shifted to Apache Spark because of its in-memory computation capability. Also in SAP HANA, we faced many issues when users hit the database very frequently. We added a lot of nodes in the cluster but nothing great happened.
Thomas Young | TrustRadius Reviewer
Score 9 out of 10
Vetted Review
Verified User
Incentivized
How does Apache Spark perform against competing tools? I think Apache Spark does well in processing large volumes of data. The machine learning models also seem to be easier to program and interpret. With that said, the programming side of Apache Spark seems more difficult to implement good models than Kinesis or other tools. You really have to have lots of data and very valuable questions to answer to justify the investment in Apache Spark.
Score 9 out of 10
Vetted Review
Verified User
Incentivized
There are a few alternatives that can do the same transformation and aggregation like Apache Spark can do but most of them are not able to perform parallel computation. For example, pandas is a really good tool to do that but not parallelized; However, there are some tools that leverage pandas interface and syntax with dask and ray on the backend.
Surendranatha Reddy Chappidi | TrustRadius Reviewer
Score 9 out of 10
Vetted Review
Verified User
Incentivized
  • Apache Spark works in distributed mode using cluster
  • Informatica and Datastage cannot scale horizontally
  • We can write custom code in spark, whereas in Datastage and Informatica we can only choose the different features proivided already.
  • Apache Spark is open-sourced and free, whereas we need to buy license for Datastage and Informatica
Score 8 out of 10
Vetted Review
Verified User
1. Apache Spark is almost 100 % faster than Hadoop.
2. Apache Spark is more stable than Amazon EMR.
3. The end to end distributed machine library is more robust in Apache Spark.
4. For very large data sets, Apache Spark is more trustworthy than the other two.
5. For data transformations, Apache Spark provides a very rich set of APIs.
6. The interface provided for SQL in Apache Spark is easy to understand as compared to others.
Score 9 out of 10
Vetted Review
Verified User
Incentivized
Databricks uses Spark as a foundation, and is also a great platform. It does bring several add-ons, which we did not feel needed by the time we evaluated - and haven't needed since then. One interesting plus in our opinion was the engineering support, which is great depending on the criticality of your platform.
March 16, 2019

Apache Spark Review

Score 7 out of 10
Vetted Review
Verified User
Incentivized
It is easy to learn, read and to maintain. It brings the best of the Ruby on Rails framework from Java that helps to create a web service so easily. Communication is one of the most distinctive features of Apache Spark compared to alternative products. You are able to communicate with your colleague in your team who also uses Spark while you are on the phone.
Score 9 out of 10
Vetted Review
Verified User
Incentivized
We evaluated SAS alongside with Apache Spark but during the course of proof of concept found that Apache Spark was able to support the hadoop eco-system and hadoop file system much better. It was much faster at that time while having the ability to process data quickly for the business analytical needs and and also scaled up well.
Carla Borges | TrustRadius Reviewer
Score 10 out of 10
Vetted Review
Verified User
Incentivized
I prefer Apache Spark compared to Hadoop, since in my experience Spark has more usability and comes equipped with simple APIs for Scala, Python, Java and Spark SQL, as well as provides feedback in REPL format on the commands. At the same time, Apache Spark seems to have the best performance in the processing of large data that works in memory and, therefore, more processes can be downloaded on Spark than on Hadoop, despite the fact that Hadoop is also a very useful tool.
Nitin Pasumarthy | TrustRadius Reviewer
Score 10 out of 10
Vetted Review
Verified User
Incentivized
All the above systems work quite well on big data transformations whereas Spark really shines with its bigger API support and its ability to read from and write to multiple data sources. Using Spark one can easily switch between declarative versus imperative versus functional type programming easily based on the situation. Also it doesn't need special data ingestion or indexing pre-processing like Presto. Combining it with Jupyter Notebooks (https://github.com/jupyter-incubator/sparkmagic), one can develop the Spark code in an interactive manner in Scala or Python.
Kartik Chavan | TrustRadius Reviewer
Score 9 out of 10
Vetted Review
Verified User
Incentivized
Even with Python, MapReduce is lengthy coding. Combination of Python with Apache Spark will not only shorten the code, but it will effectively increase the speed of algorithms. Occasionally, I use MapReduce, but Apache Spark will replace MapReduce very soon. It has many built-in and faster features.
Anson Abraham | TrustRadius Reviewer
Score 9 out of 10
Vetted Review
Verified User
Incentivized
  • mapreduce and apache storm
vs MapRedce, it was faster and easier to manage. Especially for Machine Learning, where MapReduce is lacking. Also Apache Storm was slower and didn't scale as much as Spark does. Spark elasticity was easier to apply compared to storm and MapReduce.
managing resources for Spark was easier compared to storm as well. MapReduce is slower than spark.
Score 9 out of 10
Vetted Review
Verified User
Incentivized
Spark in comparison to similar technologies ends up being a one stop shop. You can achieve so much with this one framework instead of having to stitch and weave multiple technologies from the Hadoop stack, all while getting incredibility performance, minimal boilerplate, and getting the ability to write your application in the language of your choosing.
Kamesh Emani | TrustRadius Reviewer
Score 10 out of 10
Vetted Review
Verified User
Incentivized
Apache Pig and Apache Hive provide most of the things spark provide but apache spark has more features like actions and transformations which are easy to code. Spark uses optimization technique as we can select driver program and manipulate DAG (Directed Acyclic Graph)
Python can be used even for data transformations but it requires lot of coding compared to Spark and it is even so slow.
Score 10 out of 10
Vetted Review
Verified User
Incentivized
There are a few newer frameworks for general processing like Flink, Beam, frameworks for streaming like Samza and Storm, and traditional Map-Reduce. I think Spark is at a sweet spot where its clearly better than Map-Reduce for many workflows yet has gotten a good amount of support in the community that there is little risk in deploying it. It also integrates batch and streaming workflows and APIs, allowing an all in package for multiple use-cases.
Jordan Moore | TrustRadius Reviewer
Score 8 out of 10
Vetted Review
Verified User
Incentivized
Spark has primarily replaced my use of writing pure Hadoop MapReduce or Apache Pig jobs for processing data. I like the fact that I can alternate between the main programming languages that I know - Java and Python - and use those to learn the Scala API. Spark also can be installed individually on any computer, and one can quickly get started writing applications using just the Spark Shell. I also enjoy the features that you can easily add community built packages into a Spark application such as connectors to different database sources or have various data processing libraries that aren't included in the programming language that is used.
Return to navigation