Good solution for storing and processing large data
May 20, 2021

Good solution for storing and processing large data

Anonymous | TrustRadius Reviewer
Score 7 out of 10
Vetted Review
Verified User

Modules Used

  • Hadoop Common
  • Hadoop Distributed File System
  • Hadoop MapReduce

Overall Satisfaction with Apache Hadoop

We use Apache Hadoop to store and process large amounts of data (petabytes per day) across thousands of data pipelines. Hadoop works reliably for this purpose. Data scientists at the company also use it for interactive querying for analytics and modeling purposes.
  • Storing large amounts of data
  • Processing large amounts of data via a familiar SQL interface
  • Slower than other interactive querying engines. Queries take minutes at least and up to hours sometimes
  • Tuning the settings to be able to run certain queries can require a lot of domain knowledge
  • Large scale data storage
  • Large scale data processing
  • Large scale interactive data querying
  • Makes all of the company's data easily accessible via a SQL interface
  • Allows affordable data storage
Spark is a good alternative to Hadoop that can have faster querying and processing performance and can offer more flexibility in terms of applications that it can support.

Google BigQuery has also been a great alternative and is especially great in terms of ease of use. The capacity to process data and the speed are great without having to do any settings tuning or optimization. It also doesn't require any on-site hosting, making it a great hands off solution.

Do you think Apache Hadoop delivers good value for the price?

Yes

Are you happy with Apache Hadoop's feature set?

Yes

Did Apache Hadoop live up to sales and marketing promises?

I wasn't involved with the selection/purchase process

Did implementation of Apache Hadoop go as expected?

I wasn't involved with the implementation phase

Would you buy Apache Hadoop again?

Yes

If you have petabytes of data that you need to store and process on a regular basis and don't mind having to wait minutes for your queries to run, Apache Hadoop is great for that use case. I would supplement it with another faster interactive database for interactive querying.