Last updated 2026-05-28

Observability

notify exposes structured JSON logs to stderr, a /healthz endpoint for liveness, and a /metrics endpoint for Prometheus scrapers. The metrics endpoint always returns a parseable body so a scraper hitting it never sees a 404.

When you'd care

Setting up dashboards, debugging a slow tenant, wiring an alerting rule, or chasing an intermittent timeout in production.

Logging

Logs are emitted via Go's log/slog with a JSON handler writing to stderr at the level set by NOTIFY_LOG_LEVEL (default info). One event per line, no header, ingest directly into Loki / Datadog / Splunk / CloudWatch.

What gets logged

server_listen — one line per listener (client / internal / metrics).
notifyd_starting — boot banner with version, commit, store driver, live-connections toggle, listener ports.
stream_open / stream_close — every StreamEvents session, including connection_id, user_id, tenant_id, device_type.
event_queue_full — one log per dropped event when a client buffer overflows.
retry_failed — per-attempt error from the at-least-once retry tracker (key, connection_id, attempt, error).
Per-RPC logging interceptor — one line per Connect call with the procedure name and duration.
server_shutdown_signal / server_shutdown_error — graceful shutdown.

Sample log line

{
  "time": "2026-05-27T05:14:32.108Z",
  "level": "INFO",
  "msg": "stream_open",
  "connection_id": "8b9f...",
  "user_id": "user-alice",
  "tenant_id": "acme",
  "device_type": "browser"
}

Setting the level

# verbose — shows the per-RPC log lines from the logging interceptor
-e NOTIFY_LOG_LEVEL=debug

# default — boot, stream lifecycle, shutdown, errors
-e NOTIFY_LOG_LEVEL=info

# warnings + errors only — recommended for high-traffic production
-e NOTIFY_LOG_LEVEL=warn

# errors only
-e NOTIFY_LOG_LEVEL=error

Health checks

curl -s http://localhost:9090/healthz
# {"status":"ok"}

HTTP 200 with {"status":"ok"} once Server.Run has bound all listeners.
HTTP 503 with {"status":"not_ready"} after Shutdown begins and before the process exits.

An alias /health is registered for tool-muscle-memory.

Kubernetes probes

livenessProbe:
  httpGet:
    path: /healthz
    port: 9090
  initialDelaySeconds: 5
  periodSeconds: 10
readinessProbe:
  httpGet:
    path: /healthz
    port: 9090
  initialDelaySeconds: 2
  periodSeconds: 5

The readiness probe will start failing the instant Shutdown begins, so an in-flight rolling update stops sending new traffic to the pod immediately while Shutdown drains in-flight requests.

Metrics

/metrics on the metrics port returns text/plain; version=0.0.4 — the standard Prometheus exposition. v0.1 ships a placeholder body so scrapers don't 404; real metric series land in a follow-up wave without changing the route.

curl -s http://localhost:9090/metrics
# # notify metrics — placeholder

Prometheus scrape config

scrape_configs:
  - job_name: notify
    metrics_path: /metrics
    static_configs:
      - targets: ['notify:9090']
    relabel_configs:
      - source_labels: [__address__]
        target_label: instance

Counter shapes the follow-up wave will ship

Once the registry is wired, expect these series. Labels are intentionally cardinality-bounded — no per-tenant labels.

Metric	Type	Labels
`notify_notify_requests_total`	Counter	`code`
`notify_send_total`	Counter	`channel`, `provider`, `status`
`notify_send_duration_seconds`	Histogram	`channel`, `provider`
`notify_store_op_total`	Counter	`op`, `code`
`notify_store_op_duration_seconds`	Histogram	`op`
`notify_live_connections`	Gauge	`device_type`
`notify_stream_events_total`	Counter	`kind`

Tracing (future)

notify does not emit OpenTelemetry spans today. The Connect interceptors are the right place to add them; once the follow-up lands, set OTEL_* env vars per the standard Go OTel SDK auto-configuration and traces will surface in the backend of your choice.

Recipe: structured-logs pipeline

Loki + Grafana is the cheapest path for JSON logs. Drop this in your promtail config:

scrape_configs:
  - job_name: notify
    static_configs:
      - targets: [localhost]
        labels:
          job: notify
          __path__: /var/log/notify/*.log
    pipeline_stages:
      - json:
          expressions:
            level: level
            msg: msg
            user_id: user_id
            tenant_id: tenant_id
      - labels:
          level: