Centralized alerting with PagerDuty
Overall Satisfaction with PagerDuty
We use PagerDuty extensively in the IT department. We roll out a lot of projects all the time, and each of them is managed by a small number of people, so it's important for us to be alerted as quickly and efficiently as possible should something go wrong. Our projects also have different technical stacks and, as such, different monitoring tools (AWS Cloudwatch, Azure Monitor, Prometheus, Grafana, etc.). Therefore, PagerDuty acts as the last point where all alerts are gathered, independently of where they came from.
Pros
- Integration with all of the monitoring tools we need
- Excellent mobile application to receive alerts
- Email, SMS, and phone alerts if needed
- Escalation policies are really great
Cons
- The web interface needs some love
- Initial setup might be a bit complicated in the beginning
- Having an advanced way to chart past alerts would be nice. It would allow us to have a graphical visualization of where most issues happen. The existing report charts are not very customizable.
- We have hundreds of APIs and websites monitored, with alerts going to PagerDuty. It makes it easy for us to quickly communicate to the business that something is wrong. We are aware of the issue and are working on it. It is helpful even if sometimes the problem lies within one of our providers, like a recent Azure global outage.
- Global time to resolution across all of our apps has decreased by about 30% in the last two years.
- Having well-targeted alerts go to the right person at the right time helped us maximize the availability of business-critical applications.
Absolutely! That's the main reason we are using PagerDuty. No matter what monitoring tool is used, all the alerts are routed to PagerDuty. PagerDuty takes care of sending alerts to the correct person, depending on a couple of factors. Some alerts are fired to teams during the night or on the weekend, and if one of the team members is not available, then an escalation policy is triggered, so the next person in line is alerted. Also, there are a lot of ways for a user to configure how to get the alerts. They can get an email or push notification. If it's not taken into account, they can receive a robocall from the PagerDuty call center.
This is definitely one of the main advantages of PagerDuty. As said before, we have a lot of technical stacks with different requirements and alerting tools. We have many websites on Azure and AWS. Therefore, we use their internal monitoring tools. We also have another generic tool called Site24x7. Everything is then centralized inside Grafana. All of those can forward the alerts to PagerDuty in a very easy, simple way. These integrations help us gather all alerts in the same place.
We often use the escalation feature of PagerDuty since we have some critical apps used by people around the world. Therefore, we need to send alerts to the right person depending on the day, hour, and location. We have not really used the automated incident response yet, since most of our issues require human interaction to be resolved.
The dashboards provided by PagerDuty are quite nice to have a quick overview across the whole account. Sadly, it lacks some ways to customize and is quite tricky to get more details. For example, which team or service receives the most incidents and how long it takes to fix any incident, whether by application or by the team.
There is too little of a difference between OpsGenie and PagerDuty. Both tools are really great and do the job they promise very well. If I had to choose, I'd go with PagerDuty. This is not because of any features or because it's better. It is because I've been using it for the last couple of years, and it has served my team very well.
Do you think PagerDuty delivers good value for the price?
Yes
Are you happy with PagerDuty's feature set?
Yes
Did PagerDuty live up to sales and marketing promises?
Yes
Did implementation of PagerDuty go as expected?
Yes
Would you buy PagerDuty again?
Yes
Comments
Please log in to join the conversation