Apache Spark: Lightning-Fast Distributed Computing with a Learning Curve
Overall Satisfaction with Apache Spark
If you are working on large and big scale data with analytics - don't go further without the use of Apache Spark! One of the projects that I was involved in using Apache Spark was a Recommendation Systems based project. My area or domain of research expertise is also Recommendation Systems. The deployment of a RecSys along with the use of Apache Spark - functionalities like scalability, flexibility of using various data sources along with fault-tolerant systems - are very easy. The built-in machine learning library MLlib is a boon to work. We don't require any other libraries.
Pros
- Fault-tolerant systems: in most cases, no node fails. If it fails - the processing still continues.
- Scalable to any extent.
- Has built-in machine learning library called - MLlib
- Very flexible - data from various data sources can be used. Usage with HDFS is very easy
Cons
- Its fully not backward compatible.
- It is memory-consuming for heavy and large workloads and datasets
- Support for advanced analytics is not available - MLlib has minimalistic analytics.
- Deployment is a complex task for beginners.
- Scalability
- We had data across multiple sources. Integration with those data source types was not a problem
- Generation of recommendations was achievable easily
- We used Apache Spark for one of the research projects. The ROI though cannot be measured here - but the research paper got accepted to a good conference. What else would a project require??!!
We used Surprise Kit for one of the other research works. It is more fine-tuned to Recommendation systems and their algorithms. Apache Spark has MLlib for majority of ML problems. Where as software like Surprse Kit - it suitable for a specific task of Recommendations only.
Do you think Apache Spark delivers good value for the price?
Yes
Are you happy with Apache Spark's feature set?
Yes
Did Apache Spark live up to sales and marketing promises?
I wasn't involved with the selection/purchase process
Did implementation of Apache Spark go as expected?
Yes
Would you buy Apache Spark again?
Yes
Using Apache Spark
Pros | Cons |
---|---|
Like to use Easy to use Technical support not required Well integrated Consistent Quick to learn Convenient Feel confident using | Lots to learn |
- Usage of libraries
- Usage of HDFS in particular
- Basic analysis of data is possible
- Understanding internals of the product
- changing data sources - was kinda complex
- Integration of other ML libraries is not so user friendly
Comments
Please log in to join the conversation