Self-hosted observability

Observability with a heartbeat, built for engineers who self-host.

Pulse is a modern, opinionated monitoring platform: uptime checks, real-time dashboards over WebSockets, full incident lifecycle, branded status pages. One docker compose up on a single VM — scales to multi-region probes when you need it.

$ docker compose up -d
5,000+ monitors per instance · < 2s dashboard load · Go + Postgres + Redis
INC-104 · worker-iad down · 23m
14,820 checks/min · 8 workers
pulse.acme.com · live ⌘K
Monitors up
17/20
↑ 1 since 1h
Incidents
2
23m elapsed
p50 latency
87ms
↓ 12%
24h uptime
99.61%
SLO 99.95%
api.acme.com
87ms
auth.acme.com
51ms
checkout.acme.com
612ms
cdn.acme.com
18ms
worker-iad
DOWN
worker-fra
67ms
queue.acme.com
32ms
ws.acme.com
38ms
billing.acme.com
198ms
webhooks
121ms
db-primary
4ms
redis.internal
2ms
Live check stream ~14,820/min
Features

Every signal your team needs,
no extra knobs.

Opinionated defaults wherever they're sensible. Configuration sprawl is a failure mode, not a feature.

HTTP, TCP, Ping & Heartbeat checks

One mental model for every kind of check. Tag, schedule, set the cadence, decide what "down" means. Monitor docs →

$ curl https://api.acme.com/health
→ 200 OK · 87ms · 2.4 KB
$ curl https://worker-iad/healthz
→ ETIMEDOUT · 10000ms

Real-time, WebSocket-driven

The dashboard updates the instant a worker reports. No polling, no stale tabs, no refresh dance during an incident. How it works →

stream [fra] queue → 200 · 32ms
stream [sin] cdn → 200 · 18ms
stream [iad] worker → ETIMEDOUT
eventincident.opened INC-104

Incident lifecycle, not just on/off

Detected → investigating → identified → monitoring → resolved. With timelines, comments, severity, and postmortems. Incident docs →

15:35monitor down
15:42investigating
15:54severity → P1
16:07resolved · 32m

Scales when you do

One bundled worker out of the box. Enroll more in any region when you want multi-vantage probes — Pulse aggregates per majority, all, or any-region rules. Workers & regions →

iad → 87ms   sfo → 41ms
fra → 28ms   lhr → 33ms
sin → 142ms
nrt → TIMEOUT

Routing rules with intent

Slack, Email, Discord, Telegram, PagerDuty, webhooks. Combine with tag scopes, severity, time-of-day, mute windows. Alert routing docs →

rule tag:production → #ops-pager
when status: down · 3/3 windows
also oncall@acme.com
also PagerDuty (primary)

Branded status pages

One toggle to publish. Custom domain, subscriber emails, RSS, components grouped by service. Status page docs →

API         Operational · 99.98%
Web App     Operational · 100.00%
Checkout    Degraded · 99.62%
Workers · US-East · Down
Architecture · Optional

Start on one box. Grow when you have to.

Most installs run a single worker on the same VM as the API — that's the default and it's perfectly fine. When you outgrow it, drop a stateless worker binary in any region and it joins the fleet automatically. Multi-region is an opt-in toggle in Settings; the schema, scheduler, and aggregator are ready when you need them.

SFO US West · San Francisco 41ms
FRA Europe · Frankfurt 28ms
SIN Asia · Singapore 142ms
IAD US East · Virginia TIMEOUT

Down-rule (once you're multi-region): Majority of regions, all, or any. Pulse never alerts on a single regional blip.

IAD SFO FRA LHR SIN SYD GRU NRT pulse
Incident management

A clean record of what broke,
why, and when.

Every incident is a timeline. Severity changes, status transitions, comments, channel deliveries, and customer-facing updates — all in one place.

P1 US-East regional worker unreachable Investigating
Started 2026-05-17 15:35 UTC  ·  elapsed 23m  ·  affecting worker-iad.acme.com
15:35:21 UTC
Monitor down
3 consecutive failures from sfo, fra, sin. Latency timeout (10s).
15:35:24 UTC
Incident opened
Auto-triggered by alert rule "workers-on-call".
15:35:25 UTC
Alerts sent
#ops-pager (Slack), oncall@acme.com (email), PagerDuty (acked 21m ago).
15:54:12 UTC
Severity → P1
Customer impact confirmed for ~12% of US traffic. Failover to other regions in progress.
Status pages

Honest by default.

Subscribe by email, Slack, or RSS. Components grouped by service. 90 days of uptime, visible to anyone.

Some systems experiencing issues
2 active incidents · last updated just now
API
99.98% · 90 days
Operational
Web Application
100.00% · 90 days
Operational
Authentication
100.00% · 90 days
Operational
Checkout
99.62% · 90 days
Degraded
US-East Workers
96.41% · 90 days
Down

Spin it up. See it tick.

One docker compose up and you're monitoring. Self-hosted, MIT-licensed, your data stays on your hardware.

Read the quickstart
$ docker compose up -d