Splunk ITSI in Practice
April 05, 2022

Splunk ITSI in Practice

Todd Kulick | TrustRadius Reviewer
Score 9 out of 10
Vetted Review
Verified User

Overall Satisfaction with Splunk IT Service Intelligence (ITSI)

We are using Splunk IT Service Intelligence (ITSI) as the centerpiece of our Observability strategy for multiple product lines that provide interactive television services. It helps us to ensure the proper functionality of our services and the surrounding ecosystem as well as reducing mean time to service restoration when outages occur. Our Splunk ITSI system observes telemetry from our data center and clouds infrastructures as well as telemetry collected from our customer media consumption endpoint software on set-top boxes, IPTV streamers, mobile devices and web browsers.
  • Modeling low-level machine, device, and network metrics into high-level ecosystem services
  • Powerful adaptive thresholds for detecting Service and KPI anomalous behavior
  • Powerful toolbox for canned and customized event analytics pipeline providing true AI operations
  • Direct access to (integration with) all of the numerous and varied Splunk ecosystem data sources and types
  • Better integrations with "infrastructure as code" workflows via tools like Terraform
  • More support for adaptive thresholding with numerous and changing dynamic entities
  • Better ability to surface details of unhappy or anomalous KPIs and entities that contributed to episode production
  • Splunk ITSI has reduced the number of alerts exposed to our Network Operations Center by 100x while increasing the context around outages.
  • Splunk ITSI has increased the accuracy of our incident detection by leveraging the Event Analytics system to weigh the behavior of the many characteristics of each component together instead of independently.
  • Splunk ITSI has reduced our incident MTTR (mean time to restore) by detecting issues faster, presenting them more clearly, and surfacing the salient details about the underlying issue.
Splunk ITSI provides a holistic methodology for collecting and utilizing telemetry data that most other "basic" monitoring technologies and products in this space do not. By allowing us to model our ecosystem's components and services, Splunk ITSI observes and reports at a level that matches our product domain, not our technology stack. We can observe the features, capabilities, and services that we deliver, not just the machines and networks. Additionally, it can act as a toolbox, allowing easy configuration and expansion to integrate with custom data sources (in our case, custom IoT device telemetry from millions of client endpoints).
Splunk ITSI's Event Analytics system has the ability to bring together many, disparate sources of information that all relate to the same service or functionality in our ecosystem. Using all of this data and the multiple perspectives that it provides allows Splunk ITSI to more accurately detect real issues from false positives and to provide better context when issues arise. These centralized, combined views reduce alert fatigue and aid in the understanding of our components and their state. Additionally, by building our operations practice around standard methods of data collection, incident detection and reporting, and common incident response tools and processes, we have improved our service delivery. Our teams share and action with the same data and tools, whether an incident is a low-level network issue or a high-level application bug. These standardized tools and processes improve our communication and actioning at just the time that such is most valuable...in the heat of battle while we are mitigating incidents.
We have liberally replaced older monitoring and alerting based on engineer-selected fixed thresholds with Splunk ITSI's adaptive thresholds and anomaly detection. No other single action in our monitoring and observability work has had a more significant impact on our success, in multiple ways, including: fewer false positives, less alert fatigue, and less operational maintenance and upkeep. We do not even use Splunk ITSI's Predictive Analytics yet, but we hope to on-board the feature soon!
Splunk ITSI summarizes numerous disparate data sources about the operation of our product lines. This information is useful minute by minute in our operation of our services, but we also use it in a more longitudinal fashion, in monthly reports, to identify components and services that deserve additional investment due to lower SLO (Service Level Objective) scores.

Do you think Splunk IT Service Intelligence (ITSI) delivers good value for the price?

Yes

Are you happy with Splunk IT Service Intelligence (ITSI)'s feature set?

Yes

Did Splunk IT Service Intelligence (ITSI) live up to sales and marketing promises?

Yes

Did implementation of Splunk IT Service Intelligence (ITSI) go as expected?

Yes

Would you buy Splunk IT Service Intelligence (ITSI) again?

Yes

Splunk ITSI is a great tool (and toolbox) for combining together numerous and varied monitoring regimes to bring more holistic analysis and reduce alert fatigue. By leveraging the Splunk ITSI service and KPI modeling regime, ecosystem telemetry can be turned into a more reliable, clearer, high-level perspective on the current state of your components and services.

Using Splunk IT Service Intelligence (ITSI)

30 - Engineering, Network Operations, Service Delivery
10 - DevOps engineering, Splunk administration
  • Detect outages in delivered services
  • Minimize noise and false positives in outage detection
  • Provide helpful context during incidents to minimize time to restore services
  • Monthly reports showing component and service level SLO
  • Predictive analytics: detect upcoming outages before they occur
It is providing good value as the centerpiece of our observability strategy for multiple product lines.