Apache Hive Review
Overall Satisfaction with Apache Hive
Hive is currently being used across the entire analytics organization at SurveyMonkey. The business problem that we solve through it is, accessing/storing large data sets(typically logs), in a scalable and accessible place.
Pros
- SQL like query engine, allows easy ramp up from a standard RDBMS
- Scalability is great
- If properly configured the data retreival is fantastic
Cons
- The way we currently have it implemented is quite slow, but I believe that's more of our implementation
- Joins tend to be slow
- I think productivity has increased for us as we're now able to store data going far back as we want
- Allows us to perform analytics that we wouldn't be able to do otherwise. For example customer life cycle mapping is possible through this
- ROI in terms of ramp up time for new employees who don't have a big data background. Since HQL is available, which like sql, analyst that have little to no big data exposure can quickly get upto speed and start working
I wasn't part of the evaluation process for Apache Hive. This was already implemented when I joined the company. I have worked with other big data plaftforms and I personally thinks most of them are quite comporable to one another. It really depends on what the company is going for. For exampel Google Cloud makes a ton of sense for a user if they developed their application on Google App Engine.
Comments
Please log in to join the conversation