Apache Spark

Apache Spark

About TrustRadius Scoring
Score 8.7 out of 100
Apache Spark

Overview

Recent Reviews

Apache Spark in Telco

10 out of 10
July 22, 2021
Apache Spark is being widely used within the company. In Advanced Analytics department data engineers and data scientists work closely in …
Continue reading

A powerhouse processing engine.

9 out of 10
September 19, 2020
We use Apache Spark for cluster computing in large-scale data processing, ETL functions, machine learning, as well as for analytics. Its …
Continue reading

Apache Spark Review

7 out of 10
March 16, 2019
We used Apache Spark within our department as a Solution Architecture team. It helped make big data processing more efficient since the …
Continue reading

Reviewer Pros & Cons

View all pros & cons

Video Reviews

Leaving a video review helps other professionals like you evaluate products. Be the first one in your network to record a review of Apache Spark, and make your voice heard!

Pricing

View all pricing
N/A
Unavailable

Sorry, this product's description is unavailable

Entry-level set up fee?

  • No setup fee

Offerings

  • Free Trial
  • Free/Freemium Version
  • Premium Consulting / Integration Services

Would you like us to let the vendor know that you want pricing?

5 people want pricing too

Alternatives Pricing

What is Databricks Lakehouse Platform?

Databricks in San Francisco offers the Databricks Lakehouse Platform (formerly the Unified Analytics Platform), a data science platform and Apache Spark cluster manager. The Databricks Unified Data Service aims to provide a reliable and scalable platform for data pipelines, data lakes, and data…

Features Scorecard

No scorecards have been submitted for this product yet..

Product Details

What is Apache Spark?

Apache Spark Technical Details

Operating SystemsUnspecified
Mobile ApplicationNo

Comparisons

View all alternatives

Reviews and Ratings

 (147)

Ratings

Reviews

(1-22 of 22)
Companies can't remove reviews or game the system. Here's why
Score 10 out of 10
Vetted Review
Verified User
Review Source
Apache Spark is very good for prosessing large amount of data but not that good if you need many joins or low latency. With combination of delta engine performance improved alot. Especially having ACID support, time travel features and consistent view for simultaneous read and writes it’s now ready for next level.
Thomas Young | TrustRadius Reviewer
Score 9 out of 10
Vetted Review
Verified User
Review Source
The software appears to run more efficiently than other big data tools, such as Hadoop. Given that, Apache Spark is well-suited for querying and trying to make sense of very, very large data sets. The software offers many advanced machine learning and econometrics tools, although these tools are used only partially because very large data sets require too much time when the data sets get too large. The software is not well-suited for projects that are not big data in size. The graphics and analytical output are subpar compared to other tools.

Score 9 out of 10
Vetted Review
Verified User
Review Source
I would recommend Apache Spark to the colleague if that person is working with long but narrow dataset. This would be a great tool to help the person fully utilize the CPU cores and speed up the work process. However, I would not recommend this tool if the dataset is wide not not very large.
Surendranatha Reddy Chappidi | TrustRadius Reviewer
Score 9 out of 10
Vetted Review
Verified User
Review Source
Specific scenarios where Apache Spark is well suited:
1. real-time processing of streaming data
2. processing unstructured data, semi-structured data, and structured data from multiple sources
3. avoid vendor lock-in and cloud platform lock-in while developing products
Score 9 out of 10
Vetted Review
Verified User
Review Source
Well suited for: large datasets, fault tolerance, parallel processing, ETL, batch processing, streaming, analytics, graphing, or machine learning. Mostly any kind of large-scale processing, since it will save you a lot of time (days of processing). Less appropriate for: smaller datasets, you are better off using pandas or other libraries.
Score 8 out of 10
Vetted Review
Verified User
Review Source
1. Suitable where the requirement for advanced analytics is prominent.
2. When you want big data to be processed at a very fast pace.
3. For large datasets, Spark is a viable solution.
4. When you need fault tolerance to be at a precision, go for Spark.

Spark is not suitable:
1. If you want your data to be processed in real-time, then Spark is not a good solution.
2. When you need automatic optimization, then Spark fails at that point.
Score 9 out of 10
Vetted Review
Verified User
Review Source
Spark is a one-size-fits-all data processing platform. You can run batch and in-motion streams, you can use for ETL, machine learning or even graphs. You do not have multiple tools, so it makes your TCO and management tasks way easier. As every new platform, has room to grow: storage and security are the main opportunities we found.
Score 9 out of 10
Vetted Review
Verified User
Review Source
If your data is very huge, I recommend converting the underlying technology into Apache Spark. This will save you a lot of time and effort in the near future due to your growing data. The Apache Spark scalability feature also means it handles all the future data related processing.
March 16, 2019

Apache Spark Review

Score 7 out of 10
Vetted Review
Verified User
Review Source
It is beneficial to use Apache Spark if:
  • You are working with big data, preprocessing data before machine learning
  • Building simple microservices and creating PoC. It makes it easier to create REST and simple web APIs.
  • If you need great customer service, Apache Spark would be a great choice since they provide it 24/7.
Score 9 out of 10
Vetted Review
Verified User
Review Source
Apache Spark is very well suited for big data analytics in conjunction with the hadoop file system and also does a good job of providing fast access to data in SQL workloads since it has an in memory data processing engine that can very quickly process data. In addition, it can also be used for streaming data processing.
Carla Borges | TrustRadius Reviewer
Score 10 out of 10
Vetted Review
Verified User
Review Source
It is suitable for processing large amounts of data, as it is very easy to use and its syntax is simple and understandable. I also find it useful to use in a variety of applications without the need to integrate many other processing technologies, and it is very fast and has many machine learning algorithms that can be used for data problems. I find it less appropriate for data that is not so large, as it uses too many resources.
Nitin Pasumarthy | TrustRadius Reviewer
Score 10 out of 10
Vetted Review
Verified User
Review Source
Apache Spark has rich APIs for regular data transformations or for ML workloads or for graph workloads, whereas other systems may not such a wide range of support. Choose it when you need to perform data transformations for big data as offline jobs, whereas use MongoDB-like distributed database systems for more realtime queries.

Kartik Chavan | TrustRadius Reviewer
Score 9 out of 10
Vetted Review
Verified User
Review Source
When the data is very big, and you cannot afford a lot of computational timing such as in a real-time environment, it is advisable to use Apache Spark. There are alternatives to Apache Spark, but it is the most common and robust tool to work with. It is great at batch processing.
Anson Abraham | TrustRadius Reviewer
Score 9 out of 10
Vetted Review
Verified User
Review Source
Spark is great as a workflow process and extract transform layer process tool. Is really good for machine learning especially for large datasets that can be processed in split file paralallelization.
Spark streaming is scalable for close to real-time data workflow process.
what it's not good for, is smaller subset of data processing.
Score 9 out of 10
Vetted Review
Verified User
Review Source
If you are running a distributed environment and are running applications that make use of batch processing, analytics, streaming, machine learning, or graphing then I cannot recommend Spark enough. It is easy to get going, simple to learn (relative to similar technologies), and can be used in a variety of use cases. All while giving you great performance.
Kamesh Emani | TrustRadius Reviewer
Score 10 out of 10
Vetted Review
Verified User
Review Source
For large data
For best optimization
For parallel processing
For machine learning on huge data because presently available machine learning software like RapidMiner, are are limited to data size whereas Spark is not
Score 10 out of 10
Vetted Review
Verified User
Review Source
Well suited for batch and near-real time data processing tasks as well as production deployments of machine learning, especially at large scale. Not well suited for general analytics workflows for small and medium sized data sets; SQL based data warehouses like Redshift, Vertica, and etc. are better for those use cases.
Jordan Moore | TrustRadius Reviewer
Score 8 out of 10
Vetted Review
Verified User
Review Source
On the plus side, Spark is a good tool to learn to apply to various data processing problems.

As described in the Cons - Spark may not be needed unless there is truly a large amount of data to operate on. Other libraries may be better suited for the same task.