ITSI converts your underutilized Splunk data into powerful KPIs and visibility, once you master its complexities
December 19, 2020

ITSI converts your underutilized Splunk data into powerful KPIs and visibility, once you master its complexities

Anonymous | TrustRadius Reviewer
Score 8 out of 10
Vetted Review
Verified User
Review Source

Overall Satisfaction with Splunk IT Service Intelligence (ITSI)

ITSI is used by our business unit in order to provide operational visibility into all our Splunk data. Splunk does a great job of aggregating data into a single searchable dataset, but Splunk Enterprise alerts are disjointed from one another and there are too many metrics we would need to alert on in Splunk Enterprise to be manageable. ITSI provides a framework to define the IT services that matter to you, such as the health of a server, an application, or a client, constantly monitor the KPIs that make up the health of that IT service, and organize all of so that is it operationally easy to view the health of all services yet easy to drill down to a specific server or process experiencing a problem.
  • Monitor hundreds of IT services by continuously tracking thousands of KPIs in a scalable way.
  • Quickly identify problem areas by a combination of default visualizations and ability to create custom dashboards.
  • Extremely configurable to effectively monitor nearly any KPI imaginable from Splunk.
  • The extreme flexibility also makes it highly complex. Expert Splunk users are required to make full use of it.
  • Documentation is insufficient and does not cover advanced use cases that ITSI is capable of supporting.
  • Depending on how ITSI is configured, it can place heavy load on Splunk infrastructure. ITSI performance can be optimized in many ways but they are not always obvious.
  • ITSI Events/Alerts (AKA Episode Review) has flexibility in it but still not as flexible as desired. However this can be compensated by directly querying ITSI's result data in Splunk.
  • ITSI enabled the rollout of KPIs to improve both the breadth and depth of coverage to ultimately reduce MTTR by detecting issues faster and their root cause.
  • Moving from "eyes on glass" monitoring of dashboards to 100% automated alerting allowed a reduction in the number of operators required per shift.
  • Simplifying the role of an operations staff member from constant analysis of dashboard data to simply receiving an alert for triage reduced the amount of training required for new staff.
ITSI stands alone as a tool for converting Splunk data into constantly monitored and organized KPIs. The alternative is manual creation and management of Splunk Enterprise alerts. That solution is not scalable to thousands of Splunk alerts. If you are building a monitoring solution from scratch, APM solutions like App Dynamics or Dynatrace might be applicable tools for certain situations. However, if you are already invested in Splunk and are looking to unlock the value of the data already in Splunk, I'm not aware of alternatives.
Prior to ITSI Splunk had all our critical data, but there was too much to effectively monitor. Configuring a few dozen alerts hardly dented the service area of what needed to be monitored. Most critical data was monitored by operators watching Splunk dashboards and you could only assess the health of the application whose dashboard you were currently monitoring. ITSI allowed us to covert every dashboard panel into constantly monitored and alertable KPI. Further, it allowed us to organize those KPIs into a complete hierarchy of platforms, clients, tiers, and servers so that the health of the entire business unit can be assessed from a single screen and allow the drill down to a specific server having a specific problem on the same screen.
I have limited experience with ITSI's machine learning capabilities. With the limited time spent testing adaptive thresholding, I found it to be overly sensitive and generate a lot of noise. For instance ITSI can look at historical data to understand how much traffic we should expect on a Wednesday, based on previous Wednesdays, but it doesn't understand that some Wednesday's are paydays which means a completely different traffic profile. On the surface, the machine learning capabilities increases the complexity of ITSI that isn't offset by the value it adds. However, all of the machine learning capabilities are optional and not required to get powerful monitoring functionality from ITSI.
From a business perspective, we want to maintain high client satisfaction. One factor of that is ensuring the platform is always available and fast. ITSI helps us detect issues quickly for fast resolution, limiting the impact of an incident to our clients.
ITSI is the obvious tool for a scaled solution to continuously monitoring thousands of KPIs buried in Splunk. Any IT service question you might ask Splunk such as "Is traffic dropping to one of my data centers?" or "Are all my critical processes running?" or "Is traffic balanced across my web farm?" can be implemented in ITSI. However, all that flexibility comes at the cost of complexity. ITSI is easy for a consumer to use but not easy to learn how to administer. Simple use cases are not overly difficult to implement but it takes a combination of Splunk query expertise and patience to learn ITSI. Once mastered though, you gain unbelievable operational awareness into the critical KPIs hiding in your Splunk data.