Want to save dollars, resources and time processing big data, switch to Apache Spark
March 27, 2019

Want to save dollars, resources and time processing big data, switch to Apache Spark

Anonymous | TrustRadius Reviewer
Score 9 out of 10
Vetted Review
Verified User

Overall Satisfaction with Apache Spark

We sold a data science product to one of the leading US-based e-commerce firms. Suddenly, their data started growing at a very fast rate. The product, at this stage, was based on R programming. With such huge data, the product started taking a lot of time. We then started thinking of an alternative to R, to process multiplying big data such as this client has. We eventually came across Apache Spark. With the permission of the client, we started switching the codes from R to Apache Spark. It took a very long time to learn and code in Spark, but it was worth the effort. The R codes, which were taking days of time to run, came down to a few hours.
  • Very good tool to process big datasets.
  • Inbuilt fault tolerance.
  • Supports multiple languages.
  • Supports advanced analytics.
  • A large number of libraries available -- GraphX, Spark SQL, Spark Streaming, etc.
  • Very slow with smaller amounts of data.
  • Expensive, as it stores data in memory.
  • We saved a lot of time and resources, thereby saving a lot of dollars for our company as well as the client.
If your data is very huge, I recommend converting the underlying technology into Apache Spark. This will save you a lot of time and effort in the near future due to your growing data. The Apache Spark scalability feature also means it handles all the future data related processing.