The Heroku Platform, now from Salesforce, is a platform-as-a-service based on
a managed container system, with integrated data services and ecosystem for deploying modern apps. It takes an app-centric
approach for software delivery, integrated with developer tools and
workflows. It’s three main tool are: Heroku Developer Experience (DX), Heroku
Operational Experience (OpEx), and Heroku Runtime.
Heroku Developer Experience (DX)
Developers deploy directly from tools like…
Well suited: To most of the local run of datasets and non-prod systems - scalability is not a problem at all. Including data from multiple types of data sources is an added advantage. MLlib is a decently nice built-in library that can be used for most of the ML tasks. Less appropriate: We had to work on a RecSys where the music dataset that we used was around 300+Gb in size. We faced memory-based issues. Few times we also got memory errors. Also the MLlib library does not have support for advanced analytics and deep-learning frameworks support. Understanding the internals of the working of Apache Spark for beginners is highly not possible.
Heroku is very well suited for startups looking to get a server stack up and running quickly. There is little to no overhead when managing your instances. However, you'll need a background in basic DevOps or system management to make sure everything is set up correctly. In addition, it's easy to accidentally go crazy on pricing. Make sure you're only creating the server instances you need to run the base application and set up an auto-scaler plugin to handle peaks.
Heroku has a very simple deployment model, making it easy to get your application up-and-running with minimal effort. We can focus on our efforts the unique aspects of our application.
The robust add-on marketplace makes it easy to try out new approaches with minimal effort and investment -- and when we settle on a solution, we can easily scale it.
Heroku's support is quite good -- their staff is quite technical and willing to get into the weeds to diagnose even complicated problems.
Large price jumps between certain resource tiers (2x Dyno for $50 per month versus Performance Dyno for $250). Free Postgres next jumps to $50 per month.
Marketing/Branding to non-technical stakeholders. As the years pass, I've had to fight more to convince stakeholders on the value of Heroku over AWS.
Improve Buildpack documentation. This is one area where Heroku's documentation is fairly confusing.
Heroku is easy to use, services a ton of functions for you out of the box, and provides a means to get a software product off the ground and managed quickly and easily. The tools provide allows a small to medium size org to move very quickly. The CLI tools provided make managing an entire technical infrastructure simple.
The only thing I dislike about spark's usability is the learning curve, there are many actions and transformations, however, its wide-range of uses for ETL processing, facility to integrate and it's multi-language support make this library a powerhouse for your data science solutions. It has especially aided us with its lightning-fast processing times.
Easy to use web based console and easy to use command line tools; deployment is done directly from a GIT repository. What more could you ask for? The one thing that keeps me from giving it a 10 is that custom build packs are almost incomprehensible. We used one for a while because we needed cairo graphics processing. Fortunately, I was able to figure out a different way to do what we needed so that we could get off the custom build pack.
Heroku availability correlates pretty strongly to AWS US EAST availability. We had a couple of times where there was a Heroku-specific issue but not for the last 7-8 months.
1. It integrates very well with scala or python. 2. It's very easy to understand SQL interoperability. 3. Apache is way faster than the other competitive technologies. 4. The support from the Apache community is very huge for Spark. 5. Execution times are faster as compared to others. 6. There are a large number of forums available for Apache Spark. 7. The code availability for Apache Spark is simpler and easy to gain access to. 8. Many organizations use Apache Spark, so many solutions are available for existing applications.
I've used it for many years without facing any major problem. It's not hard at all to get used to it, it's documentation is outstanding and simple. We are close to 2020 and I don't think most of the existing companies or startups should still face old problems such as wasting time deploying code and calculate computing resources.
Be ready to pay a bit more than expected in the beginning if you're migrating from a big server. The application is probably not ready for the change and you have to keep improving it with time.
It's also important to consider that you can't save anything to the disc as it will be lost when your application restarts, so you have to think about using something like S3.
All the above systems work quite well on big data transformations whereas Spark really shines with its bigger API support and its ability to read from and write to multiple data sources. Using Spark one can easily switch between declarative versus imperative versus functional type programming easily based on the situation. Also it doesn't need special data ingestion or indexing pre-processing like Presto. Combining it with Jupyter Notebooks (https://github.com/jupyter-incubator/sparkmagic), one can develop the Spark code in an interactive manner in Scala or Python
Heroku is the more expensive option for hosting compared to some of the cloud platforms we investigated, but it's worth it for us because of the plug-and-play nature of Heroku deployment. We can be up and running in a few minutes and know with precision how much it will cost us each month to run the application, unlike Amazon Web Services where you have to go to great pains to configure it correctly or else you might end up with a shocking monthly bill. Overall, spending the time to configure Amazon Web Services or one of its competitors is likely the more affordable and powerful choice, because you have control over so many specifics of the configuration. But it also requires the burden of continuing to maintain and update your AWS instance, whereas with Heroku they take care of security fixes and platform upgrades. It's a great service and we are happy to pay the extra cost for the value-adds Heroku provides.
Faster turn around on feature development, we have seen a noticeable improvement in our agile development since using Spark.
Easy adoption, having multiple departments use the same underlying technology even if the use cases are very different allows for more commonality amongst applications which definitely makes the operations team happy.
Performance, we have been able to make some applications run over 20x faster since switching to Spark. This has saved us time, headaches, and operating costs.