Use Apache Spark to Speed Up Cluster Computing
Rating: 7 out of 10
January 23, 2018
Vetted Review
Verified User
1 year of experience
In our company, we used Spark for a healthcare analytical project, where we need to do large-scale data processing in a Hadoop environment. The project is about building an enterprise data lake where we bring data from multiple products and consolidate. Further, in the downstream, we will develop some business reports.
- We used to make our batch processing faster. Spark is faster in batch processing than MapReduce with it in memory computing
- Spark will run along with other tools in the Hadoop ecosystem including Hive and Pig
- Spark supports both batch and real-time processing
- Apache Spark has Machine Learning Algorithms support
Cons
- Consumes more memory
- Difficult to address issues around memory utilization
- Expensive - In-memory processing is expensive when we look for a cost-efficient processing of big data