A powerhouse processing engine.
September 19, 2020

A powerhouse processing engine.

Anonymous | TrustRadius Reviewer
Score 9 out of 10
Vetted Review
Verified User

Overall Satisfaction with Apache Spark

We use Apache Spark for cluster computing in large-scale data processing, ETL functions, machine learning, as well as for analytics. Its primarily used by the Data Engineering Department, in order to support the data lake infrastructure. It helps us to effectively manage the great amounts of data that come from our clusters, ensuring the capacity, scalability, and performance needed.
  • Speed: Apache Spark has great performance for both streaming and batch data
  • Easy to use: the object oriented operators make it easy and intuitive.
  • Multiple language support
  • Fault tolerance
  • Cluster managment
  • Supports DF, DS, and RDDs
  • Hard to learn, documentation could be more in-depth.
  • Due to it's in-memory processing, it can take a large consumption of memory.
  • Poor data visualization, too basic.
  • Saved time and resources for the company because of it's agility
  • High performance data processing.
Never had to contact them, however, they offer 24/7 support and there are a large number of forums about Spark, well-integrated with python and supports SQL syntaxis.
The only thing I dislike about spark's usability is the learning curve, there are many actions and transformations, however, its wide-range of uses for ETL processing, facility to integrate and it's multi-language support make this library a powerhouse for your data science solutions. It has especially aided us with its lightning-fast processing times.

Do you think Apache Spark delivers good value for the price?

Yes

Are you happy with Apache Spark's feature set?

Yes

Did Apache Spark live up to sales and marketing promises?

Yes

Did implementation of Apache Spark go as expected?

Yes

Would you buy Apache Spark again?

Yes

Well suited for: large datasets, fault tolerance, parallel processing, ETL, batch processing, streaming, analytics, graphing, or machine learning. Mostly any kind of large-scale processing, since it will save you a lot of time (days of processing). Less appropriate for: smaller datasets, you are better off using pandas or other libraries.