Use Apache Spark to Speed Up Cluster Computing
January 23, 2018
Use Apache Spark to Speed Up Cluster Computing

Score 7 out of 10
Vetted Review
Verified User
Overall Satisfaction with Apache Spark
In our company, we used Spark for a healthcare analytical project, where we need to do large-scale data processing in a Hadoop environment. The project is about building an enterprise data lake where we bring data from multiple products and consolidate. Further, in the downstream, we will develop some business reports.
Pros
- We used to make our batch processing faster. Spark is faster in batch processing than MapReduce with it in memory computing
- Spark will run along with other tools in the Hadoop ecosystem including Hive and Pig
- Spark supports both batch and real-time processing
- Apache Spark has Machine Learning Algorithms support
Cons
- Consumes more memory
- Difficult to address issues around memory utilization
- Expensive - In-memory processing is expensive when we look for a cost-efficient processing of big data
- We were able to make batch job faster by 20 times as compared to MapReduce
- With the language support like Scala, Java, and Python, easily manageable
We specifically choose Spark over MapReduce to make the cluster processing faster

Comments
Please log in to join the conversation