PagerDuty the Guard that Never Sleeps
July 23, 2022

PagerDuty the Guard that Never Sleeps

Allan Allitt | TrustRadius Reviewer
Score 10 out of 10
Vetted Review
Verified User

Overall Satisfaction with PagerDuty

We use PagerDuty in an automated process which links our monitoring platform SL1 to the PagerDuty platform. The PagerDuty platform then calls out the correct engineer from its shift schedule, it also reports to a Slack channel what the Alert Event was for which the engineer was called. This resolves our out of hours monitoring for the overall company where once we had to rely on a shift worker from a different department who also wasn't IT literate to notice a change in our Monitoring Dashboards.
  • Slack channel reporting
  • Integration with other products
  • Calls the engineer
  • Schedule shift patterns for on call staff
  • This product for me is perfect as is, it calls our engineers when needed and if they don't answer it further calls the secondary on call. It reports it to Slack which all our engineers have on their phones and is capable of carrying a payload of information in the description as well, I cannot think what more this product can do.
  • Given IT staff more human-friendly working hours
  • Reduce the cost of unsociable hours pay
  • Created a fully automated escalation process
Our systems require 24/7 uptime and as such we have our site replicated across a few datacentre's to uphold this, if for some reason we do still incur an outage, it's imperative that we have the systems back working correctly as soon as possible, we can face large business fines the longer the outage is. The worst thing that can happen is the customer should ring you up and inform you that they have an issue, as a major player in the media industry we like to know about the issue as soon as it happens so that we can notify them that something has happened and that we are on it. We are able to tell them before they even notice anything has happened and as such it's fixed before it has caused them an issue. We find this gives the client so much more confidence in your ability to deliver their essential data by a deadline, it also allows you to put a workaround in place if the issue is more severe.
Our monitoring platform has something called runbooks, these runbooks can be Python code, we have created runbooks that talk to the PagerDuty API, this is how our monitoring platform communicates to the PagerDuty platform and is also how the payload data is gathered detailing what has happened and passes it to the PagerDuty description message.
Monitoring -> Event Alert -> Automation Fix if possible (Runbook Script or Rundeck) -> Automation Information Gather -> Pager Duty Callout -> Ticket Creation
We do not use Analytics
This is our first use of an automated callout platform, we had seen this working in a different company and liked what we saw so adopted it ourselves.
The product works well. If we have ever had to engage support, it has been dealt with in a timely manner.

Do you think PagerDuty delivers good value for the price?

Yes

Are you happy with PagerDuty's feature set?

Yes

Did PagerDuty live up to sales and marketing promises?

Yes

Did implementation of PagerDuty go as expected?

Yes

Would you buy PagerDuty again?

Yes

This is well suited to businesses who want an automated solution to inform someone when an issue arises, this can be a problem for many companies who run a 24/7 uptime service. For us this feeds into our business 24/7 escalation procedure, we have full automation, from monitoring to firing off events to fix or gather data to attach to a PagerDuty alert which engages the engineer and escalates if there is no reply along with raising an entry in our ticketing system and also embedding the data into that already for the engineer to sign off once they have logged in and done the relevant work. PagerDuty has reduced our costs in terms of staff and has allowed our IT staff to have more human-friendly hours instead of shifts which were until 10 pm in the evening from 6 am in the morning. It also bridges the gap between 10 pm and 6 am in terms of, we don't have to rely on somebody from a non-IT department noticing that there is an issue on the alert monitor as well as them not understanding what it is they are seeing. This is now an essential part of our automation system and has made the out-of-hours escalation process so much smoother.