Good tool for unstructured data
July 21, 2021

Good tool for unstructured data

Peter Suter | TrustRadius Reviewer
Score 9 out of 10
Vetted Review
Verified User

Modules Used

  • Hadoop Distributed File System
  • Hadoop MapReduce

Overall Satisfaction with Apache Hadoop

Apache Hadoop is an open-source software library that is designed for the collection, storage, and analysis of large amounts of data sets. Apache Hadoop’s architecture comprises components that include a distributed file system. This is mostly used for massive data collection, analytics, and storage. Also, having consistent data can be integrated across other platforms and have one single source of truth.
  • Apache Hadoop has made managing large amounts of data quite easy.
  • The system contains a file system known as HDFS (Hadoop Distributed File System) which processes components and programs.
  • The parallel processing tool of this software is also a good aspect of Apache Hadoop.
  • It keeps interesting and reliable features and functions.
  • Apache Hadoop also has a store of very big data files in machines with high levels of availability.
  • I personally feel that Apache Hadoop is slower as compared to other interactive querying platforms. Queries can take up to hours sometimes which can be frustrating and discouraging sometimes.
  • Also, there are so many modules of Apache Hadoop so it takes so much more time to learn all of them. Other than that, optimization is somewhat a challenge in Apache Hadoop.
  • Data sourcing is excellent.
  • Efficient customer support.
  • Reliable customization of functionalities.
  • Spark integration.
  • Workload processing.
  • Apache Hadoop can handle even large amounts of data as well for business-level purposes.
  • HDFS also keeps data files across the machines by distinguishing them into larger blocks and then distributing them across nodes.
  • It is keeping a great role in the growth of our organization.
I feel that this is a highly reliable and scalable solution computing technology that is highly capable of processing large data sets across multiple servers and thousands of machines in a well-defined and distributed manner. Apache Hadoop can automatically scale up the number of servers and machines that are needed to process, store, and analyze data sets. It also handles explosions in data with big data technology. Apache Hadoop is good at handling all node failures as well.

Do you think Apache Hadoop delivers good value for the price?


Are you happy with Apache Hadoop's feature set?


Did Apache Hadoop live up to sales and marketing promises?


Did implementation of Apache Hadoop go as expected?


Would you buy Apache Hadoop again?


Altogether, I want to say that Apache Hadoop is well-suited to a larger and unstructured data flow like an aggregation of web traffic or even advertising. I think Apache Hadoop is great when you literally have petabytes of data that need to be stored and processed on an ongoing basis. Also, I would recommend that the software should be supplemented with a faster and interactive database for a better querying service. Lastly, it's very cost-effective so it is good to give it a shot before coming to any conclusion.