Item: Apache Hadoop
Rating: 7
Author: Verified User

Overall Satisfaction with Apache Hadoop

Use Cases and Deployment Scope

We use Apache Hadoop to store and process large amounts of data (petabytes per day) across thousands of data pipelines. Hadoop works reliably for this purpose. Data scientists at the company also use it for interactive querying for analytics and modeling purposes.

Pros and Cons

Pros

Storing large amounts of data
Processing large amounts of data via a familiar SQL interface

Cons

Slower than other interactive querying engines. Queries take minutes at least and up to hours sometimes
Tuning the settings to be able to run certain queries can require a lot of domain knowledge

Most Important Features

Large scale data storage
Large scale data processing
Large scale interactive data querying

Return on Investment

Makes all of the company's data easily accessible via a SQL interface
Allows affordable data storage

Alternatives Considered

Apache Spark and Google BigQuery

Spark is a good alternative to Hadoop that can have faster querying and processing performance and can offer more flexibility in terms of applications that it can support.

Google BigQuery has also been a great alternative and is especially great in terms of ease of use. The capacity to process data and the speed are great without having to do any settings tuning or optimization. It also doesn't require any on-site hosting, making it a great hands off solution.

Key Insights

Do you think Apache Hadoop delivers good value for the price?

Yes

Are you happy with Apache Hadoop's feature set?

Yes

Did Apache Hadoop live up to sales and marketing promises?

I wasn't involved with the selection/purchase process

Did implementation of Apache Hadoop go as expected?

I wasn't involved with the implementation phase

Would you buy Apache Hadoop again?

Yes

Other Software Used

Presto

Likelihood to Recommend

If you have petabytes of data that you need to store and process on a regular basis and don't mind having to wait minutes for your queries to run, Apache Hadoop is great for that use case. I would supplement it with another faster interactive database for interactive querying.

Comments

Please log in to join the conversation

Good solution for storing and processing large data

Modules Used