Skip to content

Architecture overview

This is the real topology โ€” the 60-second model with the actual service names, ports, and arrows filled in. The canonical version lives in ARCHITECTURE.md in the repo root; this page is the annotated tour.

The one big idea

Sudo terminates everything at one VPS, and spawns a separate agent container for each family account on demand. There is no shared agent. sudo-api is both the web server and the thing that runs docker run to create those per-user containers โ€” it talks to the host's docker socket directly. (Older docs mention a separate sudo-provisioner service; that was inlined into sudo-api during the pivot.)

Full topology

Sudo topology โ€” full stack

Legend: ๐ŸŸฆ long-lived container ยท ๐ŸŸง per-user container ยท ๐ŸŸฉ native process ยท ๐ŸŸช external service.

The services, one line each

Long-lived (declared in compose.prod.yaml):

Service Image What it does
sudo-api build: cloud/api/Dockerfile The website, the /v1/* API, and the inline provisioner that spawns per-user hermes via the docker socket.
livekit-server livekit/livekit-server WebRTC media server. One room_<user_id> per user; Pi + voice-bridge join the same room.
voice-bridge build: cloud/voice_bridge/Dockerfile livekit-agents worker: STT โ†’ hermes โ†’ TTS.
grafana / loki / promtail off-the-shelf Dashboards, log storage, log shipping. See Observability.

Per-user (spawned at runtime):

Service Image What it does
hermes-user-<uuid-hex> nousresearch/hermes-agent:v2026.5.7 (retagged) The agent for one family. Runs gateway run, exposing four platform ports. Spawned on first use, kept warm by the Pi's heartbeat, reaped when idle.

Native (not containers):

Process Where Why native
caddy VPS host TLS + host-network reverse proxy; simpler as a system service.
sudoedge the Pi Needs direct mic/speaker access.

Why it's shaped this way

A few deliberate choices that surprise people:

  • One container per family, not per request. Gives each family isolated memory, sessions, and skills on its own volume โ€” and lets the agent be stateful and proactive.
  • We don't fork hermes. The image is upstream, unmodified. All our customization is (a) three bind-mounted plugins and (b) a boot script that writes config. This keeps us on the upstream upgrade path. See Per-user provisioning.
  • sudo-api is the provisioner. No separate orchestration service; it just has the docker socket mounted. Simpler to operate, one fewer moving part.
  • Caddy is on the host, not in compose. It owns :443 and TLS certs and is managed by systemd. It's deployed by a separate, tightly-scoped path. See Deploy.

Next: the three surfaces explains how voice, chat, and WhatsApp each reach the agent.