PagerDuty the Guard that Never Sleeps
Overall Satisfaction with PagerDuty
We use PagerDuty in an automated process which links our monitoring platform SL1 to the PagerDuty platform. The PagerDuty platform then calls out the correct engineer from its shift schedule, it also reports to a Slack channel what the Alert Event was for which the engineer was called. This resolves our out of hours monitoring for the overall company where once we had to rely on a shift worker from a different department who also wasn't IT literate to notice a change in our Monitoring Dashboards.
Pros
- Slack channel reporting
- Integration with other products
- Calls the engineer
- Schedule shift patterns for on call staff
Cons
- This product for me is perfect as is, it calls our engineers when needed and if they don't answer it further calls the secondary on call. It reports it to Slack which all our engineers have on their phones and is capable of carrying a payload of information in the description as well, I cannot think what more this product can do.
- Given IT staff more human-friendly working hours
- Reduce the cost of unsociable hours pay
- Created a fully automated escalation process
Our systems require 24/7 uptime and as such we have our site replicated across a few datacentre's to uphold this, if for some reason we do still incur an outage, it's imperative that we have the systems back working correctly as soon as possible, we can face large business fines the longer the outage is. The worst thing that can happen is the customer should ring you up and inform you that they have an issue, as a major player in the media industry we like to know about the issue as soon as it happens so that we can notify them that something has happened and that we are on it. We are able to tell them before they even notice anything has happened and as such it's fixed before it has caused them an issue. We find this gives the client so much more confidence in your ability to deliver their essential data by a deadline, it also allows you to put a workaround in place if the issue is more severe.
Our monitoring platform has something called runbooks, these runbooks can be Python code, we have created runbooks that talk to the PagerDuty API, this is how our monitoring platform communicates to the PagerDuty platform and is also how the payload data is gathered detailing what has happened and passes it to the PagerDuty description message.
Monitoring -> Event Alert -> Automation Fix if possible (Runbook Script or Rundeck) -> Automation Information Gather -> Pager Duty Callout -> Ticket Creation
We do not use Analytics
This is our first use of an automated callout platform, we had seen this working in a different company and liked what we saw so adopted it ourselves.
Do you think PagerDuty delivers good value for the price?
Yes
Are you happy with PagerDuty's feature set?
Yes
Did PagerDuty live up to sales and marketing promises?
Yes
Did implementation of PagerDuty go as expected?
Yes
Would you buy PagerDuty again?
Yes
Comments
Please log in to join the conversation