Item: Apache Spark
Rating: 9
Author: Verified User

Use Cases and Deployment Scope

We sold a data science product to one of the leading US-based e-commerce firms. Suddenly, their data started growing at a very fast rate. The product, at this stage, was based on R programming. With such huge data, the product started taking a lot of time. We then started thinking of an alternative to R, to process multiplying big data such as this client has. We eventually came across Apache Spark. With the permission of the client, we started switching the codes from R to Apache Spark. It took a very long time to learn and code in Spark, but it was worth the effort. The R codes, which were taking days of time to run, came down to a few hours.

Pros and Cons

Very good tool to process big datasets.
Inbuilt fault tolerance.
Supports multiple languages.
Supports advanced analytics.
A large number of libraries available -- GraphX, Spark SQL, Spark Streaming, etc.

Very slow with smaller amounts of data.
Expensive, as it stores data in memory.

Return on Investment

We saved a lot of time and resources, thereby saving a lot of dollars for our company as well as the client.

Other Software Used

Microsoft BI, Google BigQuery, Skype

Likelihood to Recommend

If your data is very huge, I recommend converting the underlying technology into Apache Spark. This will save you a lot of time and effort in the near future due to your growing data. The Apache Spark scalability feature also means it handles all the future data related processing.

Want to save dollars, resources and time processing big data, switch to Apache Spark

Overall Satisfaction with Apache Spark

Use Cases and Deployment Scope

Pros and Cons

Return on Investment

Other Software Used

Likelihood to Recommend