Hadoop-Related Software

Best Hadoop-Related Software include:

Cloudera Manager, Amazon EMR, and Apache Spark.

Hadoop-Related Software TrustMap

TrustMaps are two-dimensional charts that compare products based on trScore and research frequency by prospective buyers. Products must have 10 or more ratings to appear on this TrustMap.

Hadoop-Related Software Overview

What is Hadoop Software?

Hadoop is a very unusual kind of open-source data store from the Apache Foundation. However, an entire ecosystem of products has evolved around the Hadoop data store, to the point where it has become its own technology category.


The central idea of Hadoop is that data is spread across many commodity, inexpensive servers, although there are several commercial distributions of Hadoop from Cloudera and Hortonworks who wrap services around the technology.


Unlike a traditional database, Hadoop can handle huge volumes of both structured and unstructured data including log files, streaming data, images, audio and video files. All of this data can be put into the Hadoop cluster and accessed, modified and processed in place, eliminating the need to duplicate and structure data in a traditional warehouse.


Once this huge volume of structured and unstructured data has been stored, how do you extract any value from it? Since Hadoop is not a structured database, structured query languages like SQL do not work. But Hadoop has its own data processing and query framework called MapReduce. Developers can use MapReduce to write programs that can retrieve whatever data is needed. However, MapReduce has several constraints affecting performance and a newer product like Apache Spark provides an alternative distributed computing framework, which is significantly more efficient. Similarly, products like Hive and Cloudera Impala provide a SQL-like query language, which is much easier for data analysts to learn and use.

Hadoop-Related Products

(1-25 of 35) Sorted by Most Reviews

Apache Hadoop
210 ratings
35 reviews
Hadoop is an open source software from Apache, supporting distributed processing and data storage. Hadoop is popular for its scalability, reliability, and functionality available across commoditized hardware.
Apache Hive
56 ratings
25 reviews
Apache Hive is database/data warehouse software that supports data querying and analysis of large datasets stored in the Hadoop distributed file system (HDFS) and other compatible systems, and is distributed under an open source license.
Amazon EMR (Elastic MapReduce)
34 ratings
9 reviews
Amazon EMR is a cloud-native big data platform for processing vast amounts of data quickly, at scale. Using open source tools such as Apache Spark, Apache Hive, Apache HBase, Apache Flink, Apache Hudi (Incubating), and Presto, coupled with the scalability of Amazon EC2 and scalable storage of Amazon…
Hortonworks Data Platform
27 ratings
9 reviews
Hortonworks Data Platform (HDP) is an open source framework for distributed storage and processing of large, multi-source data sets. HDP modernizes IT infrastructure and keeps data secure—in the cloud or on-premises—while helping to drive new revenue streams, improve customer experience, and control…
IBM Analytics Engine
22 ratings
8 reviews
IBM BigInsights is an analytics and data visualization tool leveraging hadoop.
Datameer
6 ratings
7 reviews
Analytics that make it easy for businesses to aggregate big data, leveraging the power and scale of Hadoop.
Azure HDInsight
31 ratings
6 reviews
HDInsight is an implementation of the Apache Hadoop technology stack on the Microsoft Azure cloud platform: It is based on the Hortonworks Hadoop distribution. Microsoft Azure HDInsight includes implementations of Apache Spark, HBase, Storm, Pig, Hive, Sqoop, Oozie, Ambari, etc. It also integrates…
Apache Pig
14 ratings
5 reviews
Apache Pig is a programming tool for creating MapReduce programs used in Hadoop.
Cloudera Data Science Workbench
10 ratings
3 reviews
Cloudera Data Science Workbench enables secure self-service data science for the enterprise. It is a collaborative environment where developers can work with a variety of libraries and frameworks.
Presto
9 ratings
2 reviews
Presto is an open source SQL query engine designed to run queries on data stored in Hadoop or in traditional databases. Teradata supported development of Presto followed the acquisition of Hadapt and Revelytix.
Cloudera Manager
9 ratings
2 reviews
Cloudera Manager is a management application for Apache Hadoop and the enterprise data hub, from Cloudera.
Apache Flume
6 ratings
2 reviews
Apache Flume is a product enabling the flow of logs and other data into a Hadoop environment.
IBM Db2 Big SQL
1 rating
2 reviews
IBM offers Db2 Big SQL, an enterprise grade hybrid ANSI-compliant SQL on Hadoop engine, delivering massively parallel processing (MPP) and advanced data query. Big SQL offers a single database connection or query for disparate sources such as HDFS, RDMS, NoSQL databases, object stores and WebHDFS.
SAP Vora
0 ratings
2 reviews
SAP Vora is a computing engine designed to provide better accessibility to Hadoop data from SAP HANA. SAP Vora manages unstructured Hadoop data by building structured data hierarchies and making the data queryable through an SQL interface.
Apache Sqoop
3 ratings
1 review
Apache Sqoop is a tool for use with Hadoop, used to transfer data between Apache Hadoop and other, structured data stores.
Apache Drill
3 ratings
1 review
Apache Drill is a schema-free query engine for use with NoSQL or Hadoop data or file storage systems and databases.
Oracle Big Data Cloud Service
1 rating
1 review
The Oracle Big Data Cloud Services features managed and secure platform cloud service for Apache Hadoop and Apache Spark delivered as an elastic, integrated platform. It provides support for streaming, batch, and interactive analysis.
Altiscale
Altiscale is a hosting platform for Hadoop deployments
Jethro
Jethro, from the company of the same name headquartered in New York, delivers interactive enterprise business intelligence and enterprise data warehouse services on hadoop.
Syncsort Trillium DQ for Big Data
Syncsort Trillium DQ for Big Data (formerly Trillium Quality for Big Data) supports enterprises using a Big Data framework like Hadoop with data quality functions like data integration, data cleansing, standardization and parsing, with prebuilt process flows that can be configured to meet business n…
Hydrograph
Bitwise offers Hydrograph, a data integration tool with provides ETL functionality on Hadoop and Spark.
Bitwise Hadoop Adaptor for Mainframe Data
Bitwise Hadoop Adaptor for Mainframe Data acquires mainframe data and converts it to Hadoop format for processing.
Cohesity Imanis Data
Imanis Data, now from Cohesity (acquired May 2019) is designed to present radically simple backup, recovery, and data management for Hadoop Distributed File System and NoSQL distributed databases including MongoDB, Cassandra, CouchbaseDB, Hbase, and others.