Self-hosted observability

Observability with a heartbeat, built for engineers who self-host.

Pulse is a modern, opinionated monitoring platform: uptime checks, real-time dashboards over WebSockets, full incident lifecycle, branded status pages. One docker compose up on a single VM — scales to multi-region probes when you need it.

Get started Star on GitHub

$ docker compose up -d

● 5,000+ monitors per instance · < 2s dashboard load · Go + Postgres + Redis

INC-104 · worker-iad down · 23m

14,820 checks/min · 8 workers

pulse.acme.com · live ⌘K

Monitors up

17/20

↑ 1 since 1h

Incidents

23m elapsed

p50 latency

87ms

↓ 12%

24h uptime

99.61%

SLO 99.95%

api.acme.com

87ms

auth.acme.com

51ms

checkout.acme.com

612ms

cdn.acme.com

18ms

worker-iad

DOWN

worker-fra

67ms

queue.acme.com

32ms

ws.acme.com

38ms

billing.acme.com

198ms

webhooks

121ms

db-primary

4ms

redis.internal

2ms

Live check stream ~14,820/min

Features

Every signal your team needs,
no extra knobs.

Opinionated defaults wherever they're sensible. Configuration sprawl is a failure mode, not a feature.

HTTP, TCP, Ping & Heartbeat checks

One mental model for every kind of check. Tag, schedule, set the cadence, decide what "down" means. Monitor docs →

$ curl https://api.acme.com/health

→ 200 OK · 87ms · 2.4 KB

$ curl https://worker-iad/healthz

→ ETIMEDOUT · 10000ms

Real-time, WebSocket-driven

The dashboard updates the instant a worker reports. No polling, no stale tabs, no refresh dance during an incident. How it works →

stream ▎ ● [fra] queue → 200 · 32ms

stream ▎ ● [sin] cdn → 200 · 18ms

stream ▎ ● [iad] worker → ETIMEDOUT

event ▎ incident.opened INC-104

Incident lifecycle, not just on/off

Detected → investigating → identified → monitoring → resolved. With timelines, comments, severity, and postmortems. Incident docs →

15:35 ● monitor down

15:42 ● investigating

15:54 ● severity → P1

16:07 ● resolved · 32m

Scales when you do

One bundled worker out of the box. Enroll more in any region when you want multi-vantage probes — Pulse aggregates per majority, all, or any-region rules. Workers & regions →

● iad → 87ms ● sfo → 41ms

● fra → 28ms ● lhr → 33ms

● sin → 142ms

● nrt → TIMEOUT

Routing rules with intent

Slack, Email, Discord, Telegram, PagerDuty, webhooks. Combine with tag scopes, severity, time-of-day, mute windows. Alert routing docs →

rule tag:production → #ops-pager

when status: down · 3/3 windows

also oncall@acme.com

also PagerDuty (primary)

Branded status pages

One toggle to publish. Custom domain, subscriber emails, RSS, components grouped by service. Status page docs →

● API Operational · 99.98%

● Web App Operational · 100.00%

● Checkout Degraded · 99.62%

● Workers · US-East · Down

Architecture · Optional

Start on one box. Grow when you have to.

Most installs run a single worker on the same VM as the API — that's the default and it's perfectly fine. When you outgrow it, drop a stateless worker binary in any region and it joins the fleet automatically. Multi-region is an opt-in toggle in Settings; the schema, scheduler, and aggregator are ready when you need them.

SFO US West · San Francisco 41ms

FRA Europe · Frankfurt 28ms

SIN Asia · Singapore 142ms

IAD US East · Virginia TIMEOUT

Down-rule (once you're multi-region): Majority of regions, all, or any. Pulse never alerts on a single regional blip.

Incident management

A clean record of what broke,
why, and when.

Every incident is a timeline. Severity changes, status transitions, comments, channel deliveries, and customer-facing updates — all in one place.

P1 US-East regional worker unreachable Investigating

Started 2026-05-17 15:35 UTC · elapsed 23m · affecting worker-iad.acme.com

15:35:21 UTC

Monitor down

3 consecutive failures from sfo, fra, sin. Latency timeout (10s).

15:35:24 UTC

Incident opened

Auto-triggered by alert rule "workers-on-call".

15:35:25 UTC

Alerts sent

#ops-pager (Slack), oncall@acme.com (email), PagerDuty (acked 21m ago).

15:54:12 UTC

Severity → P1

Customer impact confirmed for ~12% of US traffic. Failover to other regions in progress.

Status pages

Honest by default.

Subscribe by email, Slack, or RSS. Components grouped by service. 90 days of uptime, visible to anyone.

API

99.98% · 90 days

Operational

Web Application

100.00% · 90 days

Operational

Authentication

100.00% · 90 days

Operational

Checkout

99.62% · 90 days

Degraded

US-East Workers

96.41% · 90 days

Down

Spin it up. See it tick.

One docker compose up and you're monitoring. Self-hosted, MIT-licensed, your data stays on your hardware.

Read the quickstart

$ docker compose up -d

Observability with a heartbeat, built for engineers who self-host.

Every signal your team needs,no extra knobs.

HTTP, TCP, Ping & Heartbeat checks

Real-time, WebSocket-driven

Incident lifecycle, not just on/off

Scales when you do

Routing rules with intent

Branded status pages

Start on one box. Grow when you have to.

A clean record of what broke,why, and when.

Honest by default.

Spin it up. See it tick.

Every signal your team needs,
no extra knobs.

A clean record of what broke,
why, and when.