good solution for long and narrow data
May 20, 2021

good solution for long and narrow data

Anonymous | TrustRadius Reviewer
Score 9 out of 10
Vetted Review
Verified User

Overall Satisfaction with Apache Spark

We are building a model and due to the size of the data, we chose to use Apache Spark for the feature generation. The usage of the tool is limited within my department and one another department. The two departments need to deal with long dataset and the other departments does not need that.
  • quick
  • utilized CPU cores
  • trendy
  • lack of support
  • memory hungry
  • slow on wide data
  • parallelization
  • compatibility
  • speed
  • reduce time
  • need tuning
  • hard to debug
There are a few alternatives that can do the same transformation and aggregation like Apache Spark can do but most of them are not able to perform parallel computation. For example, pandas is a really good tool to do that but not parallelized; However, there are some tools that leverage pandas interface and syntax with dask and ray on the backend.

Do you think Apache Spark delivers good value for the price?

Yes

Are you happy with Apache Spark's feature set?

Yes

Did Apache Spark live up to sales and marketing promises?

Yes

Did implementation of Apache Spark go as expected?

Yes

Would you buy Apache Spark again?

Yes

I would recommend Apache Spark to the colleague if that person is working with long but narrow dataset. This would be a great tool to help the person fully utilize the CPU cores and speed up the work process. However, I would not recommend this tool if the dataset is wide not not very large.