Incident Management
Communicate with your users through a structured incident lifecycle — from investigation to resolution.
Creating incidents
Incidents can be created in two ways:
Automatic
When the alert evaluator detects a confirmed outage (2 of 3 regions down), an incident is automatically created and linked to the affected monitor's status page components.
Manual
Create an incident from the Incidentspage in your dashboard. This is useful for planned announcements, known issues that monitors haven't caught, or pre-emptive communication about degraded services.
When creating a manual incident, you'll provide:
- Title — A clear, concise description of the issue
- Affected components — Which services are impacted
- Impact level — None, Minor, Major, or Critical
- Initial status— Usually "Investigating"
- Initial message — What you know so far
Status lifecycle
Every incident moves through a defined lifecycle. Each status transition is recorded as an update visible on your public status page.
Investigating
You're aware of the issue and are looking into it. Affected components are updated to reflect the incident impact.
Identified
The root cause has been found. Communicate what went wrong and what you're doing to fix it.
Monitoring
A fix has been deployed and you're watching for recurrence. Services should be recovering.
Resolved
The incident is over. Affected components are reset to 'Operational'. A resolved_at timestamp is recorded.
Writing effective incident updates
Good incident communication builds trust. Follow these guidelines:
Be specific about impact
'Users in the EU region may experience 5-10 second delays when loading dashboards' is better than 'Some users may be affected'.
State what you know and don't know
'We've identified a database connection pool issue. We're still determining the root cause of the pool exhaustion.'
Provide an ETA when possible
'We expect to deploy a fix within the next 30 minutes' — even a rough estimate is better than silence.
Update regularly
Post an update at least every 30 minutes during active incidents, even if it's just 'Still investigating, no new information.'
Public incident timeline
Your status page displays incidents in two sections:
- Active incidents — Pinned at the top of the page with impact badges and the full update timeline (newest first)
- Past incidents — The last 14 days of resolved incidents, shown in a collapsed view at the bottom of the page
Each update in the timeline shows a status badge, message text, and timestamp. This gives your users a clear narrative of what happened and how you responded.
Maintenance windows
Schedule planned maintenance to inform your users ahead of time and prevent false alerts during the window:
- Title and description — What maintenance is being performed
- Affected components — Which services will be impacted
- Start and end time — The maintenance window duration
During a maintenance window:
- Linked monitors are automatically paused — no checks are executed and no alerts fire
- The status page displays the maintenance notice with a distinctive blue/neutral style
- When the window ends, monitors automatically resume
Upcoming maintenance windows are also displayed on your public status page so users can plan accordingly.