Overall Satisfaction with Apache Hive
Hive is currently being used across the entire analytics organization at SurveyMonkey. The business problem that we solve through it is, accessing/storing large data sets(typically logs), in a scalable and accessible place.
- I think productivity has increased for us as we're now able to store data going far back as we want
- Allows us to perform analytics that we wouldn't be able to do otherwise. For example customer life cycle mapping is possible through this
- ROI in terms of ramp up time for new employees who don't have a big data background. Since HQL is available, which like sql, analyst that have little to no big data exposure can quickly get upto speed and start working
I wasn't part of the evaluation process for Apache Hive. This was already implemented when I joined the company. I have worked with other big data plaftforms and I personally thinks most of them are quite comporable to one another. It really depends on what the company is going for. For exampel Google Cloud makes a ton of sense for a user if they developed their application on Google App Engine.
I think Apache hive is great for a company just stepping into the big data realm. I think the fact that it's open source allows for a variety of tools to be integrated. The fact that it has HiveQL makes for a great transition from a standard RDMS to a big data tool. This can be very nice in terms of cost savings as the ramp up time for an analyst will be quite low.