Skip to main content
TrustRadius
Druid

Druid

Overview

What is Druid?

According to the Apache Software Foundation, Apache Druid is a high-performance, real-time analytics database designed to deliver sub-second queries on streaming and batch data at scale and under load. It is positioned as a solution for powering real-time analytics applications that require fast queries...

Read more
Recent Reviews

Druid gets the job done

9 out of 10
January 08, 2024
We use Druid for rapid ingest of a variety of data sources, including traditional databases, Kafka topics, and data stored in Hadoop. Our …
Continue reading

TrustRadius Insights

Apache Druid solves the key business problem of real-time ingestion and analytical queries on high volume data. According to users, it …
Continue reading
Read all reviews

Reviewer Pros & Cons

View all pros & cons
Return to navigation

Product Demos

Project Shapeshift demo: SQL ingest for Druid

YouTube

Apache Druid Meetup featuring SigNoz.io

YouTube

Apache Druid 24 multi-stage query showcase

YouTube

Interactive Realtime Dashboards On Data Streams Using Apache Kafka ,Druid And Superset

YouTube
Return to navigation

Product Details

What is Druid?

According to the Apache Software Foundation, Apache Druid is a high-performance, real-time analytics database designed to deliver sub-second queries on streaming and batch data at scale and under load. It is positioned as a solution for powering real-time analytics applications that require fast queries and high uptime. Apache Druid is said to be suitable for companies of all sizes, from small businesses to large enterprises. It caters to a wide range of professions and industries, including data analysts, data engineers, business intelligence professionals, e-commerce companies, and digital advertising agencies.

Key Features

According to the vendor, Apache Druid offers an interactive query engine that utilizes scatter/gather for high-speed queries. Queries can be processed in parallel, enabling sub-second performance for most queries, even with very large data sets. The tiering and quality of service feature allows for configurable tiering, guaranteeing priority and avoiding resource contention. This feature enables fine-tuning of cluster resources for optimal performance.

Apache Druid automatically optimizes the data format by columnarizing, time-indexing, dictionary-encoding, bitmap-indexing, and type-aware compressing the ingested data. This optimization provides fast filtering and searching across multiple columns with compressed bitmap indexes. It also optimizes storage by compressing string columns using dictionary encoding and numeric columns using compressed raw values.

The elastic architecture of Apache Druid consists of loosely coupled components for ingestion, queries, and orchestration. This architecture enables easy scale-up and scale-out with a deep storage layer, providing flexibility and quick scalability to handle large aggregations and high-performance applications.

According to the vendor, Apache Druid offers true stream ingestion with connector-free integration with streaming platforms such as Apache Kafka and Amazon Kinesis. This feature enables query-on-arrival, high scalability, low latency, and guaranteed consistency. Apache Druid supports the ingestion of millions of events per second and continuous backup into deep storage.

The vendor claims that Apache Druid ensures non-stop reliability through automatic data services, including continuous backup, automated recovery, and multi-node replication. These services are designed to ensure high availability and durability of data, providing a reliable and fault-tolerant system for critical applications.

Apache Druid features schema auto-discovery, which allows for automatic detection, definition, and updating of column names and data types upon ingestion. This feature provides the ease of schemaless data ingestion with the performance of strongly typed schemas, reducing the need for manual schema management and improving data ingestion efficiency.

The flexible joins support in Apache Druid enables join operations during data ingestion and at query-time execution. This feature provides the fastest query performance when tables are pre-joined during ingestion, enabling efficient data analysis across multiple dimensions and tables.

Developers and analysts can leverage the familiar SQL API for end-to-end data operations in Apache Druid. The vendor states that Apache Druid supports SQL-based queries for ingestion, transformation, and querying of data, simplifying the adoption and integration of Druid into existing data workflows.

Druid Videos

Apache Druid 0.21.0 Quickstart, Lookups, and JOINs
Demonstrating Apache Druid Rollup

Druid Technical Details

Deployment TypesOn-premise
Operating SystemsWindows, Linux, Mac
Mobile ApplicationNo
Return to navigation

Comparisons

View all alternatives
Return to navigation

Reviews and Ratings

(3)

Community Insights

TrustRadius Insights are summaries of user sentiment data from TrustRadius reviews and, when necessary, 3rd-party data sources. Have feedback on this content? Let us know!

Apache Druid solves the key business problem of real-time ingestion and analytical queries on high volume data. According to users, it allows them to ingest data in real-time from streaming sources and aggregate it for serving analytical dashboards. It provides fast querying capabilities, slice and dice, and stable datastore setup with minimal maintenance. Users benefit from OLAP data analytics, reporting business metrics, and powering different UIs with real-time aggregation and grouping problems solved for high volume data processing.

Druid is a good alternative for visualization and BI tools with low latency query results in the UI. It enables users to retire several existing third-party systems, power front ends, and reporting for users, providing quick real-time insights into business data. One reviewer mentioned that they used Druid to store syslog data from network devices at a huge rate and perform analytics on streaming data, allowing them to communicate to customers how their marketing campaigns are performing in real-time. Overall, Apache Druid is a versatile solution that addresses the growing analytical demands of businesses while offering stable performance and scalability.

Reviews

(1-1 of 1)
Companies can't remove reviews or game the system. Here's why
January 08, 2024

Druid gets the job done

Score 9 out of 10
Vetted Review
Verified User
We use Druid for rapid ingest of a variety of data sources, including traditional databases, Kafka topics, and data stored in Hadoop. Our users enjoy the easy creation of ingest specs, and the ability to ingest only the relevant columns/fields required for their programs and queries. Being able to translate and enrich data during ingest is a huge plus.
  • Rapid ingest
  • Limiting ingest to only the relevant fields/columns
  • Easy ingest spec creation
  • Security configuration is problematic
  • Cluster management could have more features
  • Troubleshooting incomplete tasks/jobs is a chore
It is extremely well suited to rapid ingest of data from large data sources, due to the fact that you can restrict what is ingested by column/field, so that you only pull in the data you actually want or need.

As stated earlier, the open source version could use better cluster management tools, and troubleshooting tools for failing jobs/tasks.
  • Integration with S3 storage has saved about 35% on our storage, over HDFS
  • The rapid ingest has saved user's time in the query aspects of their applications.
  • The ability to ingest from a variety of data sources has made overall user application queries much simpler
Apache Kafka, Cloudera Distribution Hadoop (CDH), Apache Spark
Return to navigation