Apache Spark Reviews

117 Ratings
<a href='https://www.trustradius.com/static/about-trustradius-scoring' target='_blank' rel='nofollow noopener noreferrer'>trScore algorithm: Learn more.</a>
Score 8.2 out of 100

Do you work for this company? Manage this listing

Overall Rating

Reviewer's Company Size

Last Updated

By Topic

Industry

Department

Experience

Job Type

Role

Filtered By:

Reviews (1-17 of 17)

Yogesh Mhasde | TrustRadius Reviewer
January 11, 2020

Apache Spark -- The best big data solution

Score 8 out of 10
Vetted Review
Verified User
Review Source

Pros and Cons

  • DataFrames, DataSets, and RDDs.
  • Spark has in-built Machine Learning library which scales and integrates with existing tools.
  • The data processing done by Spark comes at a price of memory blockages, as in-memory capabilities of processing can lead to large consumption of memory.
  • The caching algorithm is not in-built in Spark. We need to manually set up the caching mechanism.
Read Yogesh Mhasde's full review
Anonymous | TrustRadius Reviewer
March 27, 2019

Want to save dollars, resources and time processing big data, switch to Apache Spark

Score 9 out of 10
Vetted Review
Verified User
Review Source

Pros and Cons

  • Very good tool to process big datasets.
  • Inbuilt fault tolerance.
  • Supports multiple languages.
  • Supports advanced analytics.
  • A large number of libraries available -- GraphX, Spark SQL, Spark Streaming, etc.
  • Very slow with smaller amounts of data.
  • Expensive, as it stores data in memory.
Read this authenticated review
Anonymous | TrustRadius Reviewer
March 16, 2019

Apache Spark Review

Score 7 out of 10
Vetted Review
Verified User
Review Source

Pros and Cons

  • Customizable, it integrates with Jupyter notebooks which was really helpful for our team.
  • Easy to use and implement.
  • It allows us to quickly build microservices.
  • Release cycles can be faster.
  • Sometimes it kicked some of the users out due to inactivity.
Read this authenticated review
Anonymous | TrustRadius Reviewer
March 06, 2019

Sparking the future

Score 8 out of 10
Vetted Review
Verified User
Review Source

Pros and Cons

  • It is very fast.
  • It is gaining usability now that the PySpark community is growing and more functions are being developed.
  • Programmers can run different languages on the servers.
  • PySpark does not have the same ease of use and functionality that Pandas does yet.
Read this authenticated review
Thomas Young | TrustRadius Reviewer
January 25, 2019

Spark is useful, but requires lots of very valuable questions to justify the effort, and be prepared for failure in answering posed questions

Score 7 out of 10
Vetted Review
Verified User
Review Source

Pros and Cons

  • Apache Spark makes processing very large data sets possible. It handles these data sets in a fairly quick manner.
  • Apache Spark does a fairly good job implementing machine learning models for larger data sets.
  • Apache Spark seems to be a rapidly advancing software, with the new features making the software ever more straight-forward to use.
  • Apache Spark requires some advanced ability to understand and structure the modeling of big data. The software is not user-friendly.
  • The graphics produced by Apache Spark are by no means world-class. They sometimes appear high-schoolish.
  • Apache Spark takes an enormous amount of time to crunch through multiple nodes across very large data sets. Apache Spark could improve this by offering the software in a more interactive programming environment.
Read Thomas Young's full review
Shiv Shivakumar | TrustRadius Reviewer
December 14, 2018

Apache Spark - defacto for big data processing/analytics

Score 9 out of 10
Vetted Review
Verified User
Review Source

Pros and Cons

  • in memory data engine and hence faster processing
  • does well to lay on top of hadoop file system for big data analytics
  • very good tool for streaming data
  • could do a better job for analytics dashboards to provide insights on a data stream and hence not have to rely on data visualization tools along with spark
  • also there is room for improvement in the area of data discovery
Read Shiv Shivakumar's full review
Carla Borges | TrustRadius Reviewer
August 28, 2018

Very useful application for Big Data processing and excellent for large volume production workflows

Score 10 out of 10
Vetted Review
Verified User
Review Source

Pros and Cons

  • It performs a conventional disk-based process when the data sets are too large to fit into memory, which is very useful because, regardless of the size of the data, it is always possible to store them.
  • It has great speed and ability to join multiple types of databases and run different types of analysis applications. This functionality is super useful as it reduces work times
  • Apache Spark uses the data storage model of Hadoop and can be integrated with other big data frameworks such as HBase, MongoDB, and Cassandra. This is very useful because it is compatible with multiple frameworks that the company has, and thus allows us to unify all the processes.
  • Increase the information and trainings that come with the application, especially for debugging since the process is difficult to understand.
  • It should be more attentive to users and make tutorials, to reduce the learning curve.
  • There should be more grouping algorithms.
Read Carla Borges's full review
Nitin Pasumarthy | TrustRadius Reviewer
July 21, 2018

Apache Spark: One stop shop for distributed data processing, machine learning and graph processing

Score 10 out of 10
Vetted Review
Verified User
Review Source

Pros and Cons

  • Rich APIs for data transformation making for very each to transform and prepare data in a distributed environment without worrying about memory issues
  • Faster in execution times compare to Hadoop and PIG Latin
  • Easy SQL interface to the same data set for people who are comfortable to explore data in a declarative manner
  • Interoperability between SQL and Scala / Python style of munging data
  • Documentation could be better as I usually end up going to other sites / blogs to understand the concepts better
  • More APIs are to be ported to MLlib as only very few algorithms are available at least in clustering segment
Read Nitin Pasumarthy's full review
Anson Abraham | TrustRadius Reviewer
March 27, 2018

Apache Spark, the be all End All.

Score 9 out of 10
Vetted Review
Verified User
Review Source

Pros and Cons

  • Machine Learning.
  • Data Analysis
  • WorkFlow process (faster than MapReduce).
  • SQL connector to multiple data sources
  • Memory management. Very weak on that.
  • PySpark not as robust as scala with spark.
  • spark master HA is needed. Not as HA as it should be.
  • Locality should not be a necessity, but does help improvement. But would prefer no locality
Read Anson Abraham's full review
Kartik Chavan | TrustRadius Reviewer
June 07, 2018

My Apache Spark Review

Score 9 out of 10
Vetted Review
Verified User
Review Source

Pros and Cons

  • Easy ELT Process
  • Easy clustering on cloud
  • Amazing speed
  • Batch & real time processing
  • Debugging is difficult as it is new for most people
  • There are fewer learning resources
Read Kartik Chavan's full review
Kamesh Emani | TrustRadius Reviewer
October 26, 2017

Apache Spark - Simple Syntax, Huge Data Handling, Best Optimization, Parallel processing

Score 10 out of 10
Vetted Review
Verified User
Review Source

Pros and Cons

  • Spark uses Scala which is a functional programming language and easy to use language. Syntax is simpler and human readable.
  • It can be used to run transformations on huge data on different cluster parallelly. It automatically optimizes the process to get output efficiently in less time.
  • It also provides machine learning API for data science applications and also Spark SQL to query fast for data analysis.
  • I also use Zeppelin online tool which is used to fast query and very helpful for BI guys to visualize query outputs.
  • Data visualization.
  • Waiting for Web Development for small apps to be started with Spark as backbone middleware and HDFS as data retrieval file system.
  • Transformations and actions available are limited so must modify API to work for more features.
Read Kamesh Emani's full review
Sunil Dhage | TrustRadius Reviewer
June 26, 2017

Sparkling Spark

Score 10 out of 10
Vetted Review
Verified User
Review Source

Pros and Cons

  • It makes the ETL process very simple when compared to SQL SERVER and MYSQL ETL tools.
  • It's very fast and has many machine learning algorithms which can be used for data science problems.
  • It is easily implemented on a cloud cluster.
  • The initialization and spark context procedures.
  • Running applications on a cluster is not well documented anywhere, some applications are hard to debug.
  • Debugging and Testing are sometimes time-consuming.
Read Sunil Dhage's full review
Jordan Moore | TrustRadius Reviewer
September 12, 2016

A useful replacement for MapReduce for Big Data processing

Score 8 out of 10
Vetted Review
Verified User
Review Source

Pros and Cons

  • Scale from local machine to full cluster. You can run a standalone, single cluster simply by starting up a Spark Shell or submitting an application to test an algorithm, then it quickly can be transferred and configured to run in a distributed environment.
  • Provides multiple APIs. Most people I know use Python and/or Java as their main programming language. Data scientists who are familiar with NumPy and SciPy can quickly become comfortable with Spark, while Java developers would best served using Java 8 and the new features that it provides. Scala, on the other hand, is a mix between the Java and Python styles of writing Spark code, in my opinion.
  • Plentiful learning resources. The Learning Spark book is a good introduction to the mechanics of Spark although written for Spark 1.3, and the current version is 2.0. The GitHub repository for the book contains all the code examples that are discussed, plus the Spark website is also filled with useful information that is simple to navigate.
  • For data that isn't truly that large, Spark may be overkill when the problem could likely be solved on a computer with reasonable hardware resources. There doesn't seem to be a lot of examples for how a Spark task would otherwise be implemented in a different library; for instance scikit-learn and NumPy rather than Spark MLlib.
Read Jordan Moore's full review
Anonymous | TrustRadius Reviewer
December 13, 2017

Apache Spark Should Spark Your Interest

Score 9 out of 10
Vetted Review
Verified User
Review Source

Pros and Cons

  • Ease of use, the Spark API allows for minimal boilerplate and can be written in a variety of languages including Python, Scala, and Java.
  • Performance, for most applications we have found that jobs are more performant running via Spark than other distributed processing technologies like Map-Reduce, Hive, and Pig.
  • Flexibility, the frameworks comes with support for streaming, batch processing, sql queries, machine learning, etc. It can be used in a variety of applications without needing to integrate a lot of other distributed processing technologies.
  • Resource heavy, jobs, in general, can be very memory intensive and you will want the nodes in your cluster to reflect that.
  • Debugging, it has gotten better with every release but sometimes it can be difficult to debug an error due to ambiguous or misleading exceptions and stack traces.
Read this authenticated review
Anonymous | TrustRadius Reviewer
January 23, 2018

Use Apache Spark to Speed Up Cluster Computing

Score 7 out of 10
Vetted Review
Verified User
Review Source

Pros and Cons

  • We used to make our batch processing faster. Spark is faster in batch processing than MapReduce with it in memory computing
  • Spark will run along with other tools in the Hadoop ecosystem including Hive and Pig
  • Spark supports both batch and real-time processing
  • Apache Spark has Machine Learning Algorithms support
  • Consumes more memory
  • Difficult to address issues around memory utilization
  • Expensive - In-memory processing is expensive when we look for a cost-efficient processing of big data
Read this authenticated review

About Apache Spark

Categories:  Hadoop-Related

Apache Spark Technical Details

Operating Systems: Unspecified
Mobile Application:No