Item: Datadog
Rating: 9
Author: Vinit Parakh

Overall Satisfaction with Datadog

Use Cases and Deployment Scope

In our environment, Datadog is a core part of our observability stack for application performance and infrastructure. We are using RUM for monitoring customer performance in web applications and identifying issues. For API, we have an APM monitor to check latency, track all endpoints and status codes, and the success rate. We have comprehensive AWS infrastructure monitoring, including EC2 instances, RDS MySQL Clusters, NLB/ALB, VPC flows, and Lambda functions. Additionally, APM for Java Microservices includes tracing, JVM Heap, GC, and thread pools. We are also utilizing it for log analysis and on-call services, including custom logs for settlement jobs, credit applications, normalization for fast searching, resolving deadlocks, and handling 5xx bursts. We are also using WAF for application endpoint security.

Pros and Cons

Pros

Log Management
APM - Application Process Management
Infrastructure Monitoring
Security Monitoring
Dashboards, custom monitors, real time visibility
Alerting & On-Call Services - teams, email, phone, app popup

Cons

LLM
AI Observibiltiy
Endpoint Security
Automated QA Testing, Synthetic Testing

Return on Investment

Datadog has had a very positive ROI for us because it direclty reduced downtime, improved customer experience, and helped our team to delete the issues early, and operate more efficienlty.
Engineers can quickly correlate logs, metrics, and traces in one places instead of spending hours seraching across servers.
infrastructure side, visibility into CPU, memory, and RDS usage helped us right-size several EC2 instances, resulting in ~10–15% cloud cost savings.

Usability

Datadog saved the teams 8 to 12 hours per week in manual troubleshooting and log analysis. Also minimized infrastructure downtime by alerting the US through various channels and prioritizing issues with the On-Call Service. Since adopting Datadog, our mean time to resolution (MTTR) for production incidents has decreased by 35–45%, primarily because engineers can now quickly correlate logs, metrics, and traces in one place, rather than spending hours searching across servers.

Alternatives Considered

Grafana, Prometheus, Amazon CloudWatch and Dynatrace

First think first - it's easy to use, and very easy to implement in any infrastructure. It provides a custom dashboard and monitors. I’ve used or evaluated Grafana, Prometheus, Amazon CloudWatch, and Dynatrace, and each tool has strong capabilities. Prometheus + Grafana provide solid open-source metric collection and visualization, but they require more maintenance and don’t offer native logs + traces out of the box. CloudWatch integrates well with AWS, but becomes difficult for deep APM, log correlation, or cross-service troubleshooting in large distributed systems. Dynatrace offers powerful automation and root-cause analysis but is significantly more complex to implement and manage.

Key Insights

Do you think Datadog delivers good value for the price?

Yes

Are you happy with Datadog's feature set?

Yes

Did Datadog live up to sales and marketing promises?

Yes

Did implementation of Datadog go as expected?

Yes

Would you buy Datadog again?

Yes

Other Software Used

Grafana Loki, Dynatrace, Prometheus

Likelihood to Recommend

Datadog works extremely well in distributed, Java microservices-heavy environments, providing unified visibility across logs, metrics, traces, and infrastructure. Datadog is less ideal for LLM or AI monitoring, QA Automation testing, and security.

Comments

Please log in to join the conversation

Observability Done Right