Observability Done Right
December 03, 2025

Observability Done Right

Vinit Parakh | TrustRadius Reviewer
Score 9 out of 10
Vetted Review
Verified User

Overall Satisfaction with Datadog

In our environment, Datadog is a core part of our observability stack for application performance and infrastructure. We are using RUM for monitoring customer performance in web applications and identifying issues. For API, we have an APM monitor to check latency, track all endpoints and status codes, and the success rate. We have comprehensive AWS infrastructure monitoring, including EC2 instances, RDS MySQL Clusters, NLB/ALB, VPC flows, and Lambda functions. Additionally, APM for Java Microservices includes tracing, JVM Heap, GC, and thread pools. We are also utilizing it for log analysis and on-call services, including custom logs for settlement jobs, credit applications, normalization for fast searching, resolving deadlocks, and handling 5xx bursts. We are also using WAF for application endpoint security.

Pros

  • Log Management
  • APM - Application Process Management
  • Infrastructure Monitoring
  • Security Monitoring
  • Dashboards, custom monitors, real time visibility
  • Alerting & On-Call Services - teams, email, phone, app popup

Cons

  • LLM
  • AI Observibiltiy
  • Endpoint Security
  • Automated QA Testing, Synthetic Testing
  • Datadog has had a very positive ROI for us because it direclty reduced downtime, improved customer experience, and helped our team to delete the issues early, and operate more efficienlty.
  • Engineers can quickly correlate logs, metrics, and traces in one places instead of spending hours seraching across servers.
  • infrastructure side, visibility into CPU, memory, and RDS usage helped us right-size several EC2 instances, resulting in ~10–15% cloud cost savings.
Datadog saved the teams 8 to 12 hours per week in manual troubleshooting and log analysis. Also minimized infrastructure downtime by alerting the US through various channels and prioritizing issues with the On-Call Service. Since adopting Datadog, our mean time to resolution (MTTR) for production incidents has decreased by 35–45%, primarily because engineers can now quickly correlate logs, metrics, and traces in one place, rather than spending hours searching across servers.
First think first - it's easy to use, and very easy to implement in any infrastructure. It provides a custom dashboard and monitors. I’ve used or evaluated Grafana, Prometheus, Amazon CloudWatch, and Dynatrace, and each tool has strong capabilities. Prometheus + Grafana provide solid open-source metric collection and visualization, but they require more maintenance and don’t offer native logs + traces out of the box. CloudWatch integrates well with AWS, but becomes difficult for deep APM, log correlation, or cross-service troubleshooting in large distributed systems. Dynatrace offers powerful automation and root-cause analysis but is significantly more complex to implement and manage.

Do you think Datadog delivers good value for the price?

Yes

Are you happy with Datadog's feature set?

Yes

Did Datadog live up to sales and marketing promises?

Yes

Did implementation of Datadog go as expected?

Yes

Would you buy Datadog again?

Yes

Datadog works extremely well in distributed, Java microservices-heavy environments, providing unified visibility across logs, metrics, traces, and infrastructure. Datadog is less ideal for LLM or AI monitoring, QA Automation testing, and security.

Comments

More Reviews of Datadog