Skip to content

Observability of Pulse itself

Prometheus metrics, structured logs, loadtest harness.

The API exposes Prometheus metrics at /api/v1/metrics. Sample series:

pulse_scheduler_jobs_dispatched_total
pulse_scheduler_queue_depth
pulse_worker_check_duration_seconds_bucket{region="iad",kind="http"}
pulse_alerts_fired_total{severity="P1"}
pulse_http_request_duration_seconds_bucket{route="/api/v1/monitors",method="GET",status="200"}
pulse_websocket_connections

Unauthenticated on loopback; if you expose externally, put it behind a Traefik basic-auth middleware.

All services log structured JSON to stdout. Ship them with Vector / Promtail / fluent-bit. Each line has time, level, msg, service, plus per-event fields (monitor_id, incident_id, etc.).

Not implemented. Tracing is an explicit non-goal — see Welcome.

scripts/loadtest.sh creates N monitors, samples scheduler dispatch rate over 60s, and reports lag. Used to verify the spec §13 P7 acceptance (5,000 monitors at 1/min without lag).