Item: Apache Spark
Rating: 10
Author: Riyaz Khan

Overall Satisfaction with Apache Spark

Use Cases and Deployment Scope

Earlier we were using RDBMS like Oracle for retail and eCommerce data. We faced challenges such as cost, performance, and a huge amount of transactions coming in. After a lot of critical issues we migrated to delta lake. Now, we are using Apache Spark Streaming to deal with all real-time transactions. For batch data as well, we are pretty much handling TBs of data using Apache Spark.

Pros and Cons

Pros

Realtime data processing
Interactive Analysis of data
Trigger Event Detection

Cons

Machine Learning
GraphX Lib
True Realtime Streaming

Most Important Features

Fast Processing
In-Memory Computing
Provides better insights

Return on Investment

No investment as it is open source
Cheap commodity hardwares can save lot of money

Alternatives Considered

Apache Hadoop, SAP HANA Cloud and Apache Ignite

Apache Spark is a fast-processing in-memory computing framework. It is 10 times faster than Apache Hadoop. Earlier we were using Apache Hadoop for processing data on the disk but now we are shifted to Apache Spark because of its in-memory computation capability. Also in SAP HANA, we faced many issues when users hit the database very frequently. We added a lot of nodes in the cluster but nothing great happened.

Key Insights

Do you think Apache Spark delivers good value for the price?

Yes

Are you happy with Apache Spark's feature set?

Yes

Did Apache Spark live up to sales and marketing promises?

Yes

Did implementation of Apache Spark go as expected?

Yes

Would you buy Apache Spark again?

Yes

Other Software Used

SAP HANA Cloud, Apache Hive, Apache Airflow, Apache Kafka, Tableau Server, Tableau Desktop

Likelihood to Recommend

Well suited for batch processing and provides performance improvement through optimization techniques. Data Streaming is getting better with Apache Spark Structured Streaming. Out of memory issues and Data Skewness problems when data is not properly organized. Integration with BI tools such as Tableau could be better.

Comments

Please log in to join the conversation

Lightning Fast In-Memory Cluster Computing Framework