good solution for long and narrow data
May 20, 2021
good solution for long and narrow data
Score 9 out of 10
Vetted Review
Verified User
Overall Satisfaction with Apache Spark
We are building a model and due to the size of the data, we chose to use Apache Spark for the feature generation. The usage of the tool is limited within my department and one another department. The two departments need to deal with long dataset and the other departments does not need that.
Pros
- quick
- utilized CPU cores
- trendy
Cons
- lack of support
- memory hungry
- slow on wide data
- parallelization
- compatibility
- speed
- reduce time
- need tuning
- hard to debug
There are a few alternatives that can do the same transformation and aggregation like Apache Spark can do but most of them are not able to perform parallel computation. For example, pandas is a really good tool to do that but not parallelized; However, there are some tools that leverage pandas interface and syntax with dask and ray on the backend.
Do you think Apache Spark delivers good value for the price?
Yes
Are you happy with Apache Spark's feature set?
Yes
Did Apache Spark live up to sales and marketing promises?
Yes
Did implementation of Apache Spark go as expected?
Yes
Would you buy Apache Spark again?
Yes
Comments
Please log in to join the conversation