Try Splunk if you haven’t already, it’ll make devs and ops’ lives easier
May 02, 2021

Try Splunk if you haven’t already, it’ll make devs and ops’ lives easier

Anonymous | TrustRadius Reviewer
Score 8 out of 10
Vetted Review
Verified User
Review Source

Software Version

SignalFx Microservices APM

Overall Satisfaction with Splunk Infrastructure Monitoring (formerly SignalFx)

We’re using Splunk [Infrastructure Monitoring (formerly SignalFx)] to report real-time metrics when the number/percentage of a specific event is important. For example, we use it to detect error cases or monitor real-time throughout of the system. We also use the detectors to get alerts when the condition is met. It’s widely used by almost all engineering teams
  • Metric breakdowns
  • Dashboards and charts
  • Detectors and alerting mechanism
  • Chart sharing needs an improvement because it creates links only valid for 7 days.
  • Metrics search can be improved, most of the time I want to see which specific metric is used in which charts. This would make my life so much easier to find the charts that I’m looking for.
  • There is a 5000 time series limit on each metric. If any metric has a breakdown more than 5000 combination, only some of them is reported and this make my charts sometimes unreliable. It would be nice to support more time series, at least with a configuration.
  • Definetely MTTR
  • Reduced downtime because when we get a no heartbeat alert, we jump in and resolve the issue ASAP
  • Increased monitoring costs, Splunk is one of the 3 monitoring tools we use
They’re not for the same purpose but we’re using NewRelic and Honeycomb for monitoring purposes. NewRelic is used for HTTP client monitoring for system related throughput, error, database and external client monitoring. Honeycomb is used to monitor actual HTTP request/response values. Splunk [Infrastructure Monitoring] is used for real-time application related throughout and error monitoring.

Do you think Splunk Infrastructure Monitoring (formerly SignalFx) delivers good value for the price?

Yes

Are you happy with Splunk Infrastructure Monitoring (formerly SignalFx)'s feature set?

Yes

Did Splunk Infrastructure Monitoring (formerly SignalFx) live up to sales and marketing promises?

I wasn't involved with the selection/purchase process

Did implementation of Splunk Infrastructure Monitoring (formerly SignalFx) go as expected?

I wasn't involved with the implementation phase

Would you buy Splunk Infrastructure Monitoring (formerly SignalFx) again?

Yes

We have external dependencies and our revenue depends on the responses we get from these external customers. With the help of an alerting feature, whenever we have an outage on external side, we can ping them to fix their system. This helps us to have shorter outages meaning that less money loss.
Our time is the one who uses the most of the Splunk quota. Most of the time operations team sends warning to use less number of time series if possible.
We’re very careful when choosing the right alerting conditions to prevent false alerting or alert storms. There are plenty of options to choose from, but we mostly use heartbeat checks and thresholds. These are the most reliable ones according to our experiences.
Error detection, outage detection, throughput monitoring, percentage monitoring are the scenarios that we’re currently using [Splunk Infrastructure Monitoring (formerly SignalFx)] for and they are well suited. I believe it is less appropriate when a precise number is required for monitoring. Because sometimes discrepancies occur because of metric collectors. For example, 100 metric is reported at time t, but on the charts, it’s split as 60 for time t and 40 for t1.