Likelihood to Recommend Well suited: To most of the local run of datasets and non-prod systems - scalability is not a problem at all. Including data from multiple types of data sources is an added advantage. MLlib is a decently nice built-in library that can be used for most of the ML tasks. Less appropriate: We had to work on a RecSys where the music dataset that we used was around 300+Gb in size. We faced memory-based issues. Few times we also got memory errors. Also the MLlib library does not have support for advanced analytics and deep-learning frameworks support. Understanding the internals of the working of Apache Spark for beginners is highly not possible.
Read full review Heroku is very well suited for startups looking to get a server stack up and running quickly. There is little to no overhead when managing your instances. However, you'll need a background in basic DevOps or system management to make sure everything is set up correctly. In addition, it's easy to accidentally go crazy on pricing. Make sure you're only creating the server instances you need to run the base application and set up an auto-scaler plugin to handle peaks.
Read full review Pros Rich APIs for data transformation making for very each to transform and prepare data in a distributed environment without worrying about memory issues Faster in execution times compare to Hadoop and PIG Latin Easy SQL interface to the same data set for people who are comfortable to explore data in a declarative manner Interoperability between SQL and Scala / Python style of munging data Read full review Heroku has a very simple deployment model, making it easy to get your application up-and-running with minimal effort. We can focus on our efforts the unique aspects of our application. The robust add-on marketplace makes it easy to try out new approaches with minimal effort and investment -- and when we settle on a solution, we can easily scale it. Heroku's support is quite good -- their staff is quite technical and willing to get into the weeds to diagnose even complicated problems. Read full review Cons Memory management. Very weak on that. PySpark not as robust as scala with spark. spark master HA is needed. Not as HA as it should be. Locality should not be a necessity, but does help improvement. But would prefer no locality Read full review Large price jumps between certain resource tiers (2x Dyno for $50 per month versus Performance Dyno for $250). Free Postgres next jumps to $50 per month. Marketing/Branding to non-technical stakeholders. As the years pass, I've had to fight more to convince stakeholders on the value of Heroku over AWS. Improve Buildpack documentation. This is one area where Heroku's documentation is fairly confusing. Read full review Likelihood to Renew Capacity of computing data in cluster and fast speed.
Steven Li Senior Software Developer (Consultant)
Read full review Heroku is easy to use, services a ton of functions for you out of the box, and provides a means to get a software product off the ground and managed quickly and easily. The tools provide allows a small to medium size org to move very quickly. The CLI tools provided make managing an entire technical infrastructure simple.
Read full review Usability If the team looking to use Apache Spark is not used to debug and tweak settings for jobs to ensure maximum optimizations, it can be frustrating. However, the documentation and the support of the community on the internet can help resolve most issues. Moreover, it is highly configurable and it integrates with different tools (eg: it can be used by
dbt core), which increase the scenarios where it can be used
Read full review Easy to use web based console and easy to use command line tools; deployment is done directly from a GIT repository. What more could you ask for? The one thing that keeps me from giving it a 10 is that custom build packs are almost incomprehensible. We used one for a while because we needed cairo graphics processing. Fortunately, I was able to figure out a different way to do what we needed so that we could get off the custom build pack.
Read full review Reliability and Availability Heroku availability correlates pretty strongly to AWS US EAST availability. We had a couple of times where there was a Heroku-specific issue but not for the last 7-8 months.
Read full review Performance The only issue that I ever have is that about 1 out of 20 deployments (git push) will hang and need to be cancelled and done again.
Read full review Support Rating 1. It integrates very well with scala or python. 2. It's very easy to understand SQL interoperability. 3. Apache is way faster than the other competitive technologies. 4. The support from the Apache community is very huge for Spark. 5. Execution times are faster as compared to others. 6. There are a large number of forums available for Apache Spark. 7. The code availability for Apache Spark is simpler and easy to gain access to. 8. Many organizations use Apache Spark, so many solutions are available for existing applications.
Read full review I've used it for many years without facing any major problem. It's not hard at all to get used to it, it's documentation is outstanding and simple. We are close to 2020 and I don't think most of the existing companies or startups should still face old problems such as wasting time deploying code and calculate computing resources.
Read full review Implementation Rating Be ready to pay a bit more than expected in the beginning if you're migrating from a big server. The application is probably not ready for the change and you have to keep improving it with time.
It's also important to consider that you can't save anything to the disc as it will be lost when your application restarts, so you have to think about using something like S3.
Read full review Alternatives Considered Spark in comparison to similar technologies ends up being a one stop shop. You can achieve so much with this one framework instead of having to stitch and weave multiple technologies from the
Hadoop stack, all while getting incredibility performance, minimal boilerplate, and getting the ability to write your application in the language of your choosing.
Read full review Heroku is the more expensive option for hosting compared to some of the cloud platforms we investigated, but it's worth it for us because of the plug-and-play nature of Heroku deployment. We can be up and running in a few minutes and know with precision how much it will cost us each month to run the application, unlike
Amazon Web Services where you have to go to great pains to configure it correctly or else you might end up with a shocking monthly bill. Overall, spending the time to configure
Amazon Web Services or one of its competitors is likely the more affordable and powerful choice, because you have control over so many specifics of the configuration. But it also requires the burden of continuing to maintain and update your AWS instance, whereas with Heroku they take care of security fixes and platform upgrades. It's a great service and we are happy to pay the extra cost for the value-adds Heroku provides.
Read full review Return on Investment Business leaders are able to take data driven decisions Business users are able access to data in near real time now . Before using spark, they had to wait for at least 24 hours for data to be available Business is able come up with new product ideas Read full review It has been critical in seamlessly operating our platform with runs all of our programs. It has been impressive with its ability to scale quickly which results in the growth of our work. It allows for tracking of different features which allows for quick problem solving which saves us time. Emily Cooper Director, Illinois Science & Technology Coalition
Read full review ScreenShots