Use Apache Spark to Speed Up Cluster Computing
January 23, 2018

Use Apache Spark to Speed Up Cluster Computing

Anonymous | TrustRadius Reviewer
Score 7 out of 10
Vetted Review
Verified User

Overall Satisfaction with Apache Spark

In our company, we used Spark for a healthcare analytical project, where we need to do large-scale data processing in a Hadoop environment. The project is about building an enterprise data lake where we bring data from multiple products and consolidate. Further, in the downstream, we will develop some business reports.

Pros

  • We used to make our batch processing faster. Spark is faster in batch processing than MapReduce with it in memory computing
  • Spark will run along with other tools in the Hadoop ecosystem including Hive and Pig
  • Spark supports both batch and real-time processing
  • Apache Spark has Machine Learning Algorithms support

Cons

  • Consumes more memory
  • Difficult to address issues around memory utilization
  • Expensive - In-memory processing is expensive when we look for a cost-efficient processing of big data
  • We were able to make batch job faster by 20 times as compared to MapReduce
  • With the language support like Scala, Java, and Python, easily manageable
We specifically choose Spark over MapReduce to make the cluster processing faster
Well suited:
1. Data can be integrated from several sources including click stream, logs, transactional systems
2. Real-time ingestion through Kafka, Kinesis, and other streaming platforms

Comments

More Reviews of Apache Spark