Hadoop-Related Software

TrustRadius Top Rated for 2023

Top Rated Products

(1-2 of 2)

1
Azure Data Lake Storage

Azure Data Lake Storage Gen2 is a highly scalable and cost-effective data lake solution for big data analytics. It combines the power of a high-performance file system with massive scale and economy to help you speed your time to insight. Data Lake Storage Gen2 extends Azure Blob…

2
Apache Hive

Apache Hive is database/data warehouse software that supports data querying and analysis of large datasets stored in the Hadoop distributed file system (HDFS) and other compatible systems, and is distributed under an open source license.

All Products

(26-41 of 41)

26
Kyligence
0 reviews

Kyligence Zen is a low-code metrics platform to define, collect, and analyze business metrics. It connects to data sources and enables users to define business metrics, uncover hidden insights, and share them across their organization.

27
Bitwise Hadoop Adaptor for Mainframe Data

Bitwise Hadoop Adaptor for Mainframe Data acquires mainframe data and converts it to Hadoop format for processing.

28
Tencent Cloud Elastic MapReduce

Combining cloud computing and community open-source technologies such as Hadoop, Hive, Spark, HBase, Presto, and Storm, Tencent Cloud Elastic MapReduce (EMR) provides cloud-based Hadoop services featuring high reliability and elastic scalability. Using EMR, users can create a secure…

29
Huawei Cloud MapReduce Service

MapReduce Service (MRS) on Huawei Cloud provides enterprise-level big data clusters on the cloud. Tenants can control clusters and run big data components such as Hadoop, Spark, HBase, Kafka, and Storm.

30
Altiscale
0 reviews

Altiscale is a hosting platform for Hadoop deployments

31
TCS Connected Intelligence Platform

TCS Digital Software & Solutions developed the TCS Connected Intelligence Platform (CIP) as a unified data analytics platform that enables business and technical stakeholders to harness multi-domain data from across the organization to gain a competitive advantage faster, and at…

32
bodo.ai
0 reviews

Bodo is an SQL and Python data processing platform powered by advanced compilers and MPI parallelization technologies. Bodo enhances data engineering by aiming to provide improvements in speed, scale, and cost efficiency, bringsing HPC levels of performance and efficiency to data…

33
Zementis
0 reviews

Zementis gives organizations a single tool for predictive analytics. It is presented by the vendor, Software AG, as intuitive and easy-to-use, taking predictive analytics beyond the data science team so anyone in an enterprise can understand customer behavior, market dynamics, and…

34
Apache ORC
0 reviews

A solution presented as the smallest, fastest columnar storage for Hadoop workloads.

35
Apache Beam
0 reviews

Apache Beam is an open-source, unified programming model for batch and streaming data processing pipelines that simplifies large-scale data processing dynamics. Apache Beam unifies multiple data processing engines and SDKs around its distinctive Beam model. This offers a way to create…

36
IBM Analytics for Apache Spark

IBM Analytics for Apache Spark for Cloud is a service designed to provide the fast in-memory performance of Apache Spark without the hassle of self-managing Spark clusters, relying instead on the convenience of IBM Cloud.

37
HCL Vector Analytics Database

Vector, an analytics database, handles continuous updates without a performance penalty. Vector is designed to achieve extreme performance with full ACID compliance on commodity hardware with the flexibility to deploy on-premises, and on AWS, Azure, and Google Cloud with little or…

38
Jethro
0 reviews

Jethro, from the company of the same name headquartered in New York, delivers interactive enterprise business intelligence and enterprise data warehouse services on hadoop.

39
Apache Kylin
0 reviews

Apache Kylin is an open source, distributed Analytical Data Warehouse for Big Data; it was designed to provide OLAP (Online Analytical Processing) capability in the big data era. By renovating the multi-dimensional cube and precalculation technology on Hadoop and Spark, Kylin is…

40
Trino
0 reviews

Trino (formerly known as Presto SQL) is an open-source distributed SQL query engine for big data analytics that helps to explore a data universe. Trino is presented as a highly parallel and distributed query engine, that is built from the ground up for efficient, low latency analytics.…

41
AtScale
0 reviews

AtScale, headquartered in San Mateo, offers intelligent data virtualization which they state provides live, secured and governed access to Big Data across disparate systems wherever it resides—and without disruption.

Learn More About Hadoop-Related Software

What is Hadoop Software?

Hadoop is a very unusual kind of open-source data store from the Apache Foundation. However, an entire ecosystem of products has evolved around the Hadoop data store, to the point where it has become its own technology category.

The central idea of Hadoop is that data is spread across many commodity, inexpensive servers, although there are several commercial distributions of Hadoop from Cloudera and Hortonworks who wrap services around the technology.

Unlike a traditional database, Hadoop can handle huge volumes of both structured and unstructured data including log files, streaming data, images, audio and video files. All of this data can be put into the Hadoop cluster and accessed, modified and processed in place, eliminating the need to duplicate and structure data in a traditional warehouse.

Once this huge volume of structured and unstructured data has been stored, how do you extract any value from it? Since Hadoop is not a structured database, structured query languages like SQL do not work. But Hadoop has its own data processing and query framework called MapReduce. Developers can use MapReduce to write programs that can retrieve whatever data is needed. However, MapReduce has several constraints affecting performance and a newer product like Apache Spark provides an alternative distributed computing framework, which is significantly more efficient. Similarly, products like Hive and Cloudera Impala provide a SQL-like query language, which is much easier for data analysts to learn and use.

Related Categories