Apache Spark

Apache Spark

About TrustRadius Scoring
Score 8.7 out of 100
Apache Spark

Overview

Recent Reviews

Apache Spark in Telco

10 out of 10
July 22, 2021
Apache Spark is being widely used within the company. In Advanced Analytics department data engineers and data scientists work closely in …
Continue reading

A powerhouse processing engine.

9 out of 10
September 19, 2020
We use Apache Spark for cluster computing in large-scale data processing, ETL functions, machine learning, as well as for analytics. Its …
Continue reading

Apache Spark Review

7 out of 10
March 16, 2019
We used Apache Spark within our department as a Solution Architecture team. It helped make big data processing more efficient since the …
Continue reading

Reviewer Pros & Cons

View all pros & cons

Video Reviews

Leaving a video review helps other professionals like you evaluate products. Be the first one in your network to record a review of Apache Spark, and make your voice heard!

Pricing

View all pricing
N/A
Unavailable

Sorry, this product's description is unavailable

Entry-level set up fee?

  • No setup fee

Offerings

  • Free Trial
  • Free/Freemium Version
  • Premium Consulting / Integration Services

Would you like us to let the vendor know that you want pricing?

5 people want pricing too

Alternatives Pricing

What is Databricks Lakehouse Platform?

Databricks in San Francisco offers the Databricks Lakehouse Platform (formerly the Unified Analytics Platform), a data science platform and Apache Spark cluster manager. The Databricks Unified Data Service aims to provide a reliable and scalable platform for data pipelines, data lakes, and data…

Features Scorecard

No scorecards have been submitted for this product yet..

Product Details

What is Apache Spark?

Apache Spark Technical Details

Operating SystemsUnspecified
Mobile ApplicationNo

Comparisons

View all alternatives

Reviews and Ratings

 (147)

Ratings

Reviews

(1-22 of 22)
Companies can't remove reviews or game the system. Here's why
Steven Li | TrustRadius Reviewer
Score 9 out of 10
Vetted Review
Verified User
Review Source
We need to calculate risk-weighted assets (RWA) daily and monthly for different positions the bank holds on a T+1 basis. The volume of calculations is large: more than millions of records per day with very complicated formulas and algorithms. In our applications/projects, we used Scala and Apache Spark clusters to load all data we needed for calculation and implemented complicated formulas and algorithms via its DataFrame or DataSet from the Apache Spark platform.

Without adopting the Apache Spark cluster, it would be pretty hard for us to implement such a big system to handle a large volume of data calculations daily. After this system was successfully deployed into PROD, we've been able to provide capital risk control reports to regulation/compliance controllers in different regions in this global financial world.
Score 10 out of 10
Vetted Review
Verified User
Review Source
Apache Spark is being widely used within the company. In Advanced Analytics department data engineers and data scientists work closely in machine learning projects to generate value. Spark provides unified big data analytics engine which helps us easily process huge amount of data. We are using Spark in projects like churn prediction, network analytics.
Thomas Young | TrustRadius Reviewer
Score 9 out of 10
Vetted Review
Verified User
Review Source
Apache Spark is used by certain departments to produce summary statistics. The software is used for data sets that are very, very large in size and require immense processing power. The software is also used for simple graphics. When the data are small enough, Apache Spark is not the preferred analytical tool. It's the big data that makes Spark useful.
Score 9 out of 10
Vetted Review
Verified User
Review Source
We are building a model and due to the size of the data, we chose to use Apache Spark for the feature generation. The usage of the tool is limited within my department and one another department. The two departments need to deal with long dataset and the other departments does not need that.
Surendranatha Reddy Chappidi | TrustRadius Reviewer
Score 9 out of 10
Vetted Review
Verified User
Review Source
  • We are using Apache Spark in Digital - Data teams to build data products and help business teams to take data-driven decisions.
  • We use Apache Spark to source that from different source systems, process it, and store it in the data lake.
  • Once the data is in data lake, we use spark for data cleansing and data transformation as per business requirements
  • Once the data is transformed, then we will insert it into the final target layer in the data warehouse.
Score 9 out of 10
Vetted Review
Verified User
Review Source
We use Apache Spark for cluster computing in large-scale data processing, ETL functions, machine learning, as well as for analytics. Its primarily used by the Data Engineering Department, in order to support the data lake infrastructure. It helps us to effectively manage the great amounts of data that come from our clusters, ensuring the capacity, scalability, and performance needed.
Score 8 out of 10
Vetted Review
Verified User
Review Source
We were working for one of our products, which has a requirement for developing an enterprise-level product catering to manage a vast amount of Big data involved. We wanted to use a technology that is faster than Hadoop and can process large scale data by providing a streamlined process for the data scientists. Apache Spark is a powerful unified solution as we thought to be.
The main problem that we identified in our existing approach was that it was taking a large amount of time to process the data, and also the statistical analysis of the data was not up to the mark. We wanted a sophisticated analytical solution that was easy and fast to use. With using Apache Spark, the processing was made 5 times faster than earlier, giving rise to pretty good analytics. With Spark, across a cluster of machines, the data abstraction was achieved by using RDDs.
Score 9 out of 10
Vetted Review
Verified User
Review Source
We do use Apache Spark for cluster computing for our ETL environment, data and analytics as well as machine learning. It is mainly used by our data engineering team to support the entire Data Lake foundation. As we have huge amounts of information coming from multiple sources, we needed an effective cluster management system to handle capacity and deliver the performance and throughput we needed.
Score 9 out of 10
Vetted Review
Verified User
Review Source
We sold a data science product to one of the leading US-based e-commerce firms. Suddenly, their data started growing at a very fast rate. The product, at this stage, was based on R programming. With such huge data, the product started taking a lot of time. We then started thinking of an alternative to R, to process multiplying big data such as this client has. We eventually came across Apache Spark. With the permission of the client, we started switching the codes from R to Apache Spark. It took a very long time to learn and code in Spark, but it was worth the effort. The R codes, which were taking days of time to run, came down to a few hours.
March 16, 2019

Apache Spark Review

Score 7 out of 10
Vetted Review
Verified User
Review Source
We used Apache Spark within our department as a Solution Architecture team. It helped make big data processing more efficient since the same framework can be used for batch and stream processing.
Score 9 out of 10
Vetted Review
Verified User
Review Source
Used as the in memory data engine for big data analytics, streaming data and SQL workloads. Also, in the process of trying it out for certain machine learning algorithms. It basically processes data for analytical needs of the business and is a great tool to co-exist with the hadoop file systems.
Carla Borges | TrustRadius Reviewer
Score 10 out of 10
Vetted Review
Verified User
Review Source
Apache Spark is being used by the whole organization. It helps us a lot in the transmission of data, as it is 100 times faster than Hadoop MapReduce in memory and 10 times faster in disk, as we work with Java this application. It allows native links for Java programming languages, ​​and as it is compatible with SQL, is completely adapted to the needs of our organization, because of the large amount of information that we use. We highly prefer Apache Spark since it supports in-memory processing to increase performance of big data analysis applications.
Nitin Pasumarthy | TrustRadius Reviewer
Score 10 out of 10
Vetted Review
Verified User
Review Source
We use Apache Spark across all analytics departments in the company. We primarily use it for distributed data processing and data preparation for machine learning models. We also use it while running distributed CRON jobs for various analytical workloads. I am familiar with a story where we contributed an algorithm to Spark open source which is on Random Walks in Large Graphs - https://databricks.com/session/random-walks-on-large-scale-graphs-with-apache-spark
Kartik Chavan | TrustRadius Reviewer
Score 9 out of 10
Vetted Review
Verified User
Review Source
My company uses Apache Spark in various ways including machine learning, analytics and batch processing. [We] Grab the data from other sources and put it into a Hadoop environment. [We] Build data lakes. SparkSQL is also used for analysis of data and to develop reports. We have deployed the clusters in Cloudera. Because of Apache Spark, it has become very easy to apply data science in a big data field.
Anson Abraham | TrustRadius Reviewer
Score 9 out of 10
Vetted Review
Verified User
Review Source
Spark was/is being used in myriad of ways. With Kafka, using Spark Streams to grab data from kafka queue into our hdfs environment. SparkSQL used for analysis of data for those not familiar with spark. Using Spark for data analysis as well and for main workflow process. Using spark over mapreduce. Using Spark for some machine learning algo's with the data.
Score 7 out of 10
Vetted Review
Verified User
Review Source
In our company, we used Spark for a healthcare analytical project, where we need to do large-scale data processing in a Hadoop environment. The project is about building an enterprise data lake where we bring data from multiple products and consolidate. Further, in the downstream, we will develop some business reports.
Score 9 out of 10
Vetted Review
Verified User
Review Source
At my current company, we are using Spark in a variety of ways ranging from batch processing to data analysis to machine learning techniques. It has become our main driver for any distributed processing applications. It has gained quick adoption across the organization for its ease of use, integration into the Hadoop stack, and for its support in a variety of languages.
Kamesh Emani | TrustRadius Reviewer
Score 10 out of 10
Vetted Review
Verified User
Review Source
We previously used the database and Pentaho ETL tool to perform data transformation as per project requirements but as the time passed our data is building day by day and we suffered a lot of optimization problems working this way. Then we thought of implementing Hadoop cluster with 8 nodes in our company. We deployed an 8 node cluster with Cloudera distribution. Then we started using Apache Spark to create applications for Student Course Enrollment data and run them parallelly on multiprocessors.

It is used by a department but the data consists of information about students and professors of the whole organization.

It addresses the problem of assigning classrooms for a specific time in a week based on student course enrollment and professors teaching the course schedules.
This is just one aspect of the application. There are various other data transformation requirement scenarios for different departments across the organization
Jordan Moore | TrustRadius Reviewer
Score 8 out of 10
Vetted Review
Verified User
Review Source
We are learning core Apache Spark + SparkSQL and MLLib, while creating proof-of-concepts as well as providing solutions for clients. It addresses the needs of quickly processing large amounts of data, typically located in Hadoop.