Item: Splunk Observability Cloud
Rating: 8
Author: Verified User

Overall Satisfaction with Splunk Infrastructure Monitoring (formerly SignalFx)

Use Cases and Deployment Scope

We’re using Splunk [Infrastructure Monitoring (formerly SignalFx)] to report real-time metrics when the number/percentage of a specific event is important. For example, we use it to detect error cases or monitor real-time throughout of the system. We also use the detectors to get alerts when the condition is met. It’s widely used by almost all engineering teams

Pros and Cons

Pros

Metric breakdowns
Dashboards and charts
Detectors and alerting mechanism

Cons

Chart sharing needs an improvement because it creates links only valid for 7 days.
Metrics search can be improved, most of the time I want to see which specific metric is used in which charts. This would make my life so much easier to find the charts that I’m looking for.
There is a 5000 time series limit on each metric. If any metric has a breakdown more than 5000 combination, only some of them is reported and this make my charts sometimes unreliable. It would be nice to support more time series, at least with a configuration.

Return on Investment

Definetely MTTR
Reduced downtime because when we get a no heartbeat alert, we jump in and resolve the issue ASAP
Increased monitoring costs, Splunk is one of the 3 monitoring tools we use

Alternatives Considered

They’re not for the same purpose but we’re using NewRelic and Honeycomb for monitoring purposes. NewRelic is used for HTTP client monitoring for system related throughput, error, database and external client monitoring. Honeycomb is used to monitor actual HTTP request/response values. Splunk [Infrastructure Monitoring] is used for real-time application related throughout and error monitoring.

Key Insights

Do you think Splunk Observability Cloud delivers good value for the price?

Yes

Are you happy with Splunk Observability Cloud's feature set?

Yes

Did Splunk Observability Cloud live up to sales and marketing promises?

I wasn't involved with the selection/purchase process

Did implementation of Splunk Observability Cloud go as expected?

I wasn't involved with the implementation phase

Would you buy Splunk Observability Cloud again?

Yes

Real-Time Visibility

We have external dependencies and our revenue depends on the responses we get from these external customers. With the help of an alerting feature, whenever we have an outage on external side, we can ping them to fix their system. This helps us to have shorter outages meaning that less money loss.

Implementing Monitoring as a Service

Our time is the one who uses the most of the Splunk quota. Most of the time operations team sends warning to use less number of time series if possible.

Alerting Accuracy

We’re very careful when choosing the right alerting conditions to prevent false alerting or alert storms. There are plenty of options to choose from, but we mostly use heartbeat checks and thresholds. These are the most reliable ones according to our experiences.

Likelihood to Recommend

Error detection, outage detection, throughput monitoring, percentage monitoring are the scenarios that we’re currently using [Splunk Infrastructure Monitoring (formerly SignalFx)] for and they are well suited. I believe it is less appropriate when a precise number is required for monitoring. Because sometimes discrepancies occur because of metric collectors. For example, 100 metric is reported at time t, but on the charts, it’s split as 60 for time t and 40 for t1.

Comments

Please log in to join the conversation

Try Splunk if you haven’t already, it’ll make devs and ops’ lives easier

Software Version