Use Cases and Deployment Scope
We use Splunk Cloud as primary APM and infrastructure monitoring tool for cloud native AWS environments.Our stack heavily relies on EKS,ECS and serverless lambda functions so by standardizing opentelemetry we ingest metrics , traces and logs directly into Splunk without proprietary agents.The biggest problem solved is Mean Time to Resolution(MTTR) during outages.Before Splunk, investigating 502 error on ALB meant manually checking cloud watch logs and container metrics.Now distributed tracing correlates the infra anomaly directly to failing microservices trace and exact log line.It also helps in end to end monitoring across staging and production servers.
Alternatives Considered
Datadog, Prometheus and Grafana
Other Software Used
Splunk Enterprise Security, Splunk Cloud Platform, Slack, Atlassian Jira, Medium, Jenkins, Atlassian Bitbucket, Amazon Elastic Kubernetes Service (EKS), Kubernetes, OpenTelemetry, AWS CloudFormation, PagerDuty, Datadog, GitLab, IBM Terraform