If users tell you the site is down, monitoring failed. This guide covers what to alert on and a simple starter stack so you catch disk, service, and error-spike problems early.
What to alert on
- Disk usage thresholds (before you hit full root disk)
- Apache or PHP-FPM down
- Repeated 500/503 spikes
Easy starter stack
- UptimeRobot (or similar) — external availability and basic uptime
- Prometheus + Node Exporter — CPU, RAM, disk on the host
- Log-based alerts for repeating error patterns (e.g. tail Apache error log and alert on proxy/backend failures)
FAQ
What’s the minimum useful alert?
Disk above a threshold (e.g. 85%) and Apache/PHP-FPM not running. That catches the two most common “site down” causes. Pair with routine service health checks for a manual checklist.
How do I alert on 500 errors?
Option 1: external HTTP checks that alert when the homepage or a key URL returns 5xx. Option 2: parse Apache or PHP-FPM logs (or use an existing log aggregator) and trigger on repeated 500/503 in a short window.
Related
- Routine service health checks for WordPress servers — manual checklist
- Fix a full root disk on Ubuntu WordPress servers — disk-full recovery
- Apache log analysis for WordPress — log workflows and triage