Alert fatigue and the need for an alert inbox
Constant alerting from monitoring systems leads to a poor developer experience
The developer experience of alert management is broken
Monitoring setups, across infrastructure, application, performance, data and security systems, generate 100’s of alert notifications daily. Production systems have monitoring and alerting configured to notify devs when certain metrics are in alarm state. Responding to alerts is time-critical to ensure undisrupted customer experience and low impact to business.
Alerts with low signal to noise ratio start flooding channels. Over a period of time, mild warnings or information notifications form a bulk of alerts received, and spammy channels are muted or ignored.
The wall of text in these channels is tough to consume and results in degraded developer experience with excessive fatigue.
Symptoms that you have alert fatigue
Constant alert notifications increase context switching leading to higher cognitive load. This is especially true if your channels are pinging non-stop, and a large proportion of notifications aren’t relevant or actionable.
Excessive fatigue from alerting leads increases chances of missing critical alerts that may have been spotted quickly in a noise-free environment. In busy alert channels with high spam, tracing and debugging the right issues takes time.
Once a critical alert is identified, debugging is another thing. Alert notifications on slack channels are not first-class entities. If an alert has been resolved in the past, there is no history or context of who worked on it, or how it was mitigated in the past.
Identifying and debugging actionable alerts can also take way too much time, contributing to high MTTA and MTTR.
Manifestation of alert fatigue in developer teams
Teams fatigued by alerts don’t need to make many mistakes to land there. It is quite common to slowly move from a manageable situation to a state of constant fatigue. Typically, teams mature through four stages of monitoring states:
Good Alerts
As engineering systems evolve, certain incidents and issues cause downtimes. Root cause analysis of such incidents leads to setting up monitoring and alerting to warn the team in case of repeating incidents. Over time, more alerts get added with incidents and standard alerts get added across components and services.
Bad notifications
Developers start getting more alert notifications as more alerts are added. Noise starts creeping and the quality of alerts starts reducing over time. Notifications repeat, and different systems may concurrently send alerts when something goes down.
A large number of alerts are non-actionable which causes devs to be distracted by notifications. Alert notifications start being ignored to avoid distractions.
Broken interfaces
Alert fatigue is amplified by the current notification flow. Devs receive alerts on Slack or other communication channels, where alert information is fragmented and buried within the volume of other notifications.
Current interfaces do not empower the dev to solve alerts quickly. They are not suited for tracing or collaborating as they miss important alert context. There is no task management or in-place modification to edit alerts based on observed patterns
As volume of alert notifications increase, these channels compound noise and make it close to impossible to make sense of the current state of monitoring.
Missing Feedback Loop
Alerting is an iterative process and not a one-time setup and consume flow.
It is tough to forecast the usefulness, accuracy and volume of notifications generated while creating an alert. With time, as entropy of the system increases and systems evolve, constant tweaks are needed to keep alerts relevant and actionable.
With a missing feedback loop, ignored alerts return periodically, dealing the final blow for fatigue.
We need a new approach for breezing through alert notifications and keeping them actionable. We deserve a better inbox to enhance the alert management experience.
If you or your teams are currently working through noisy alert channels and are looking for a smart alternative, reach out to us and get a sneak-peek of what we’re working on!