TrustRadius: an HG Insights company

Datadog

Score8.6 out of 10

348 Reviews and Ratings

What is Datadog?

Datadog is a monitoring service for IT, Dev and Ops teams who write and run applications at scale, and want to turn the massive amounts of data produced by their apps, tools and services into actionable insight.

Media

the out-of-the-box and customizable monitoring dashboards.
Datadog's collaboration features, where users can discuss issues in-context with production data, annotate changes and notify their teams, see who responded to that alert before, and discover what was done to fix it.
where Datadog unifies traces, metrics, and logs—the three pillars of observability.
some of Datadog's 400+ built-in integrations.
Datadog's Service Map, which decomposes an application into all its component services and draws the observed dependencies between these services in real time
centralized log data, pulled from any source.
Datadog's Host Map, which lets users see all hosts together on one screen, grouped and filtered as desired, with metrics made instantly comprehensible via color and shape.

1 / 7

Datadog is a fundamentally useful platform for centralized app observabilty and beyond

Use Cases and Deployment Scope

Datadog is our first point of access for developers to review logs and monitoring of key services and architecture. Primarily, we were drowning in trying to find useful logs in AWS Cloudwatch and Datadog's log discovery capabilities are far and away better. We have setup up several key alarms for bad log patterns but have yet to find full utility in monitoring other metrics - largely because we do not have a core platform development team.

Pros

  • Log indexing
  • Log Searching
  • Dashboard building (combining logs and metrics)
  • Traceability
  • User Monitoring

Cons

  • More recipes for fundamental monitoring tooling
  • Targetting different scales of application (beyond enterprise SaaS)
  • Multiple workspaces in an account to separate users

Return on Investment

  • Highly improved distributed compute debugging
  • Still waiting for Bits AI access - negative experience on new product rollouts
  • Spreadsheets have made reporting to external teams significantly easier

Usability

Alternatives Considered

ClickHouse, Grafana and Sentry

Other Software Used

AWS CloudTrail, Amazon CloudWatch, Google Workspace

Steep learning curve but totally worth it

Use Cases and Deployment Scope

We use Datadog for monitoring and observability across our org. It gives us visibility into what's really happening when something goes wrong via tracing, and its monitors have alerted us to issues countless times before a customer could complain. We monitor performance via metrics and can dig into specific calls using traces and flame graphs to see where bottlenecks are.

Pros

  • Setting up tracing is incredibly easy and powerful
  • Log search, especially with subqueries, makes it possible to find a needle in a haystack
  • Dashboards make it easy to compare data across dimensions

Cons

  • Building dashboards is often painful - the query syntax, especially for APM, is challenging to navigate. This feels like somewhere where an LLM integration would be incredibly helpful
  • Specifically, the lack of wildcard search for APM resources makes it hard to gather or view data across a group of related endpoints
  • The query helper is often too eager to help, opening dropdowns when I don't want them and inserting extra query filters where they aren't wanted or needed.

Return on Investment

  • By using monitors for new errors, we've reduced the errors to near zero for a growing consumer-facing application. A meaningful percentage of visits were previously ending in failure, but we've been able to work through the errors with the increased visibility
  • Custom metrics have allowed us to provide excellent customer service to our business customers by analyzing usage patterns over time and helping them use our application more effectively. They really appreciate when we proactively reach out to them about ways to improve their success rate

Usability

Observability Done Right

Use Cases and Deployment Scope

In our environment, Datadog is a core part of our observability stack for application performance and infrastructure. We are using RUM for monitoring customer performance in web applications and identifying issues. For API, we have an APM monitor to check latency, track all endpoints and status codes, and the success rate. We have comprehensive AWS infrastructure monitoring, including EC2 instances, RDS MySQL Clusters, NLB/ALB, VPC flows, and Lambda functions. Additionally, APM for Java Microservices includes tracing, JVM Heap, GC, and thread pools. We are also utilizing it for log analysis and on-call services, including custom logs for settlement jobs, credit applications, normalization for fast searching, resolving deadlocks, and handling 5xx bursts. We are also using WAF for application endpoint security.

Pros

  • Log Management
  • APM - Application Process Management
  • Infrastructure Monitoring
  • Security Monitoring
  • Dashboards, custom monitors, real time visibility
  • Alerting & On-Call Services - teams, email, phone, app popup

Cons

  • LLM
  • AI Observibiltiy
  • Endpoint Security
  • Automated QA Testing, Synthetic Testing

Return on Investment

  • Datadog has had a very positive ROI for us because it direclty reduced downtime, improved customer experience, and helped our team to delete the issues early, and operate more efficienlty.
  • Engineers can quickly correlate logs, metrics, and traces in one places instead of spending hours seraching across servers.
  • infrastructure side, visibility into CPU, memory, and RDS usage helped us right-size several EC2 instances, resulting in ~10–15% cloud cost savings.

Usability

Alternatives Considered

Grafana, Prometheus, Amazon CloudWatch and Dynatrace

Other Software Used

Grafana Loki, Dynatrace, Prometheus

Great platform for making your software better

Use Cases and Deployment Scope

We use Datadog to monitor our application using monitors and alerts, create dashboards to report system performance, and general tracking of logs and traces to make debugging easier. I've also used Datadog workflows to create automated workflows for creating Jira tickets, triaging bugs, and even creating automated bug fixes by pushing errors through to AI coding tools, which can diagnose issues and suggest solutions.

Pros

  • Dashboards and monitors make it easy to visualize application performance and track important metrics
  • Datadog workflows unlock a lot of possibilities for automation
  • It's easy to add critical attributes to logs and traces to make debugging easier

Cons

  • Working with powerpacks can be difficult. In particular, adding new variables to powerpacks is very difficult, and I've had to resort to time-consuming workarounds like detaching the powerpack instance and building a whole new powerpack with the updated variables
  • Flame graphs for large traces can be difficult to pull up. A recent update made it so I'm not really sure how to view the flame graph for large traces. It can be difficult to read through the SQL tab to find the source of an N+1 issue, while the old UI with flame graphs make it really easy to find the source of the issue
  • Rare performance issues can make it difficult to pull up important information. For high-priority issues this can cause delays in resolution, which has business impact

Return on Investment

  • Enables faster response to critical issues using monitors and alerts
  • Enhances understanding of system performance, making it easier to track progress towards performance improvements

Usability

Datadog is a Very Powerful and Comprehensive Performance Monitoring Platform.

Use Cases and Deployment Scope

<div>We use Datadog as the main Observability and Appication Monitoring Tool in our organization.</div><div>It excel at our :</div><div>1 Incident Detection and Response -Wr are able to have real time monitoring of apps and service health ,with alerting configured via integrations like PagerDuty.This help us to detect Incidents quickly ,reduces downtime and ensures we meet our SLA/SLO targets.</div><div>2.End -to- End Observability - we are able to trace requests,analyse logs and monitor metrics to spot issues particulary in our Envoy mesh architecture and coroutine-driven workloads.</div>

Pros

  • Monitoring apps and server health
  • Alerting
  • Built-in dashboard gives great visualization
  • Visibility into traces ,metrics ,logs all in one place
  • Observability needs and performance monitoring

Cons

  • For custom metrics it get costly
  • There is a learning curve when building complex queries or nested monitor ,this also require a training or expert help.

Return on Investment

  • Helps us to to identify and resolve our systems and applications bugs quickly and effectively.
  • The highly customizable dashboards allow us to tailor the analytics to fit our exact needs that leads to better decision-making
  • The ability to customize which metrics and logs we ingest helps us to manage o control the cost effectively.

Usability

Alternatives Considered

New Relic

Other Software Used

IBM Cloudability, Slack, Jira Service Management