Overall Satisfaction with Apache Spark
My company uses Apache Spark in various ways including machine learning, analytics and batch processing. [We] Grab the data from other sources and put it into a Hadoop environment. [We] Build data lakes. SparkSQL is also used for analysis of data and to develop reports. We have deployed the clusters in Cloudera. Because of Apache Spark, it has become very easy to apply data science in a big data field.
- Easy ELT Process
- Easy clustering on cloud
- Amazing speed
- Batch & real time processing
- Debugging is difficult as it is new for most people
- There are fewer learning resources
- Apache Spark has faster performance compared to MapReduce.
- Combination of Python & Spark is the best. Shorter code, faster and efficient performance.
- Can replace RDBMS
Even with Python, MapReduce is lengthy coding. Combination of Python with Apache Spark will not only shorten the code, but it will effectively increase the speed of algorithms. Occasionally, I use MapReduce, but Apache Spark will replace MapReduce very soon. It has many built-in and faster features.