Lightning Fast In-Memory Cluster Computing Framework
Overall Satisfaction with Apache Spark
Earlier we were using RDBMS like Oracle for retail and eCommerce data. We faced challenges such as cost, performance, and a huge amount of transactions coming in. After a lot of critical issues we migrated to delta lake. Now, we are using Apache Spark Streaming to deal with all real-time transactions. For batch data as well, we are pretty much handling TBs of data using Apache Spark.
Pros
- Realtime data processing
- Interactive Analysis of data
- Trigger Event Detection
Cons
- Machine Learning
- GraphX Lib
- True Realtime Streaming
- Fast Processing
- In-Memory Computing
- Provides better insights
- No investment as it is open source
- Cheap commodity hardwares can save lot of money
Apache Spark is a fast-processing in-memory computing framework. It is 10 times faster than Apache Hadoop. Earlier we were using Apache Hadoop for processing data on the disk but now we are shifted to Apache Spark because of its in-memory computation capability. Also in SAP HANA, we faced many issues when users hit the database very frequently. We added a lot of nodes in the cluster but nothing great happened.
Do you think Apache Spark delivers good value for the price?
Yes
Are you happy with Apache Spark's feature set?
Yes
Did Apache Spark live up to sales and marketing promises?
Yes
Did implementation of Apache Spark go as expected?
Yes
Would you buy Apache Spark again?
Yes
Comments
Please log in to join the conversation