Last updated 2026-05-28
Observability
notify exposes structured JSON logs to stderr, a /healthz
endpoint for liveness, and a /metrics endpoint for
Prometheus scrapers. The metrics endpoint always returns a parseable
body so a scraper hitting it never sees a 404.
When you'd care
Setting up dashboards, debugging a slow tenant, wiring an alerting rule, or chasing an intermittent timeout in production.
Logging
Logs are emitted via Go's log/slog with a JSON handler
writing to stderr at the level set by NOTIFY_LOG_LEVEL
(default info). One event per line, no header, ingest
directly into Loki / Datadog / Splunk / CloudWatch.
What gets logged
server_listen— one line per listener (client / internal / metrics).notifyd_starting— boot banner with version, commit, store driver, live-connections toggle, listener ports.stream_open/stream_close— everyStreamEventssession, includingconnection_id,user_id,tenant_id,device_type.event_queue_full— one log per dropped event when a client buffer overflows.retry_failed— per-attempt error from the at-least-once retry tracker (key, connection_id, attempt, error).- Per-RPC logging interceptor — one line per Connect call with the procedure name and duration.
server_shutdown_signal/server_shutdown_error— graceful shutdown.
Sample log line
{ "time": "2026-05-27T05:14:32.108Z", "level": "INFO", "msg": "stream_open", "connection_id": "8b9f...", "user_id": "user-alice", "tenant_id": "acme", "device_type": "browser"}Setting the level
# verbose — shows the per-RPC log lines from the logging interceptor-e NOTIFY_LOG_LEVEL=debug
# default — boot, stream lifecycle, shutdown, errors-e NOTIFY_LOG_LEVEL=info
# warnings + errors only — recommended for high-traffic production-e NOTIFY_LOG_LEVEL=warn
# errors only-e NOTIFY_LOG_LEVEL=errorHealth checks
curl -s http://localhost:9090/healthz# {"status":"ok"}- HTTP 200 with
{"status":"ok"}onceServer.Runhas bound all listeners. - HTTP 503 with
{"status":"not_ready"}afterShutdownbegins and before the process exits.
An alias /health is registered for tool-muscle-memory.
Kubernetes probes
livenessProbe: httpGet: path: /healthz port: 9090 initialDelaySeconds: 5 periodSeconds: 10readinessProbe: httpGet: path: /healthz port: 9090 initialDelaySeconds: 2 periodSeconds: 5
The readiness probe will start failing the instant
Shutdown begins, so an in-flight rolling update stops
sending new traffic to the pod immediately while
Shutdown drains in-flight requests.
Metrics
/metrics on the metrics port returns
text/plain; version=0.0.4 — the standard Prometheus
exposition. v0.1 ships a placeholder body so scrapers don't 404;
real metric series land in a follow-up wave without changing the
route.
curl -s http://localhost:9090/metrics# # notify metrics — placeholderPrometheus scrape config
scrape_configs: - job_name: notify metrics_path: /metrics static_configs: - targets: ['notify:9090'] relabel_configs: - source_labels: [__address__] target_label: instanceCounter shapes the follow-up wave will ship
Once the registry is wired, expect these series. Labels are intentionally cardinality-bounded — no per-tenant labels.
| Metric | Type | Labels |
|---|---|---|
notify_notify_requests_total | Counter | code |
notify_send_total | Counter | channel, provider, status |
notify_send_duration_seconds | Histogram | channel, provider |
notify_store_op_total | Counter | op, code |
notify_store_op_duration_seconds | Histogram | op |
notify_live_connections | Gauge | device_type |
notify_stream_events_total | Counter | kind |
Tracing (future)
notify does not emit OpenTelemetry spans today. The Connect
interceptors are the right place to add them; once the
follow-up lands, set OTEL_* env vars per the standard
Go OTel SDK auto-configuration and traces will surface in the
backend of your choice.
Recipe: structured-logs pipeline
Loki + Grafana is the cheapest path for JSON logs. Drop this in your promtail config:
scrape_configs: - job_name: notify static_configs: - targets: [localhost] labels: job: notify __path__: /var/log/notify/*.log pipeline_stages: - json: expressions: level: level msg: msg user_id: user_id tenant_id: tenant_id - labels: level: