Lightning Fast In-Memory Cluster Computing Framework
August 30, 2022

Lightning Fast In-Memory Cluster Computing Framework

Riyaz Khan | TrustRadius Reviewer
Score 10 out of 10
Vetted Review
Verified User

Overall Satisfaction with Apache Spark

Earlier we were using RDBMS like Oracle for retail and eCommerce data. We faced challenges such as cost, performance, and a huge amount of transactions coming in. After a lot of critical issues we migrated to delta lake. Now, we are using Apache Spark Streaming to deal with all real-time transactions. For batch data as well, we are pretty much handling TBs of data using Apache Spark.

Pros

  • Realtime data processing
  • Interactive Analysis of data
  • Trigger Event Detection

Cons

  • Machine Learning
  • GraphX Lib
  • True Realtime Streaming
  • Fast Processing
  • In-Memory Computing
  • Provides better insights
  • No investment as it is open source
  • Cheap commodity hardwares can save lot of money
Apache Spark is a fast-processing in-memory computing framework. It is 10 times faster than Apache Hadoop. Earlier we were using Apache Hadoop for processing data on the disk but now we are shifted to Apache Spark because of its in-memory computation capability. Also in SAP HANA, we faced many issues when users hit the database very frequently. We added a lot of nodes in the cluster but nothing great happened.

Do you think Apache Spark delivers good value for the price?

Yes

Are you happy with Apache Spark's feature set?

Yes

Did Apache Spark live up to sales and marketing promises?

Yes

Did implementation of Apache Spark go as expected?

Yes

Would you buy Apache Spark again?

Yes

Well suited for batch processing and provides performance improvement through optimization techniques. Data Streaming is getting better with Apache Spark Structured Streaming. Out of memory issues and Data Skewness problems when data is not properly organized. Integration with BI tools such as Tableau could be better.

Comments

More Reviews of Apache Spark