Per-user provisioning¶

Each family gets its own hermes container. This page explains how that container is born, configured, kept alive, and reaped. The logic lives inside sudo-api at cloud/api/provisioner.py — there is no separate provisioner service.

Why per-user containers at all¶

The agent needs to be stateful: persistent memory, ongoing sessions, scheduled crons, and per-family skills. Giving each family its own container with its own data volume keeps all of that isolated and durable, and lets the agent be proactive (it has a long-lived home to run crons in). The cost is orchestration — which is exactly what the provisioner handles.

The lifecycle¶

Per-user hermes lifecycle — trigger to healthy

1. Trigger¶

Anything that needs the agent calls ensure_runtime(user_id) first: a voice turn (from voice-bridge), a chat send, a WhatsApp inbound, or even just opening /integrations.

2. Spawn¶

If the container isn't running, _spawn() issues a docker run with:

A named volume hermes-user-<uuid-hex>-data mounted at /opt/data — this is where sessions, memories, and skills survive restarts.
A read-only bind-mount of our plugins directory at /opt/data/plugins.
An entrypoint override to /bin/sh -c <_SEED_AND_EXEC> — our boot script.

Container naming

Containers are named hermes-user-<32-char uuid hex> (uuid.UUID(user_id).hex, no dashes). Any hermes-user-usr_* containers on the VPS are orphans from an old deploy and are safe to docker rm -f.

3. Seed config on every boot¶

_SEED_AND_EXEC is the heart of "we don't fork hermes." On every start it:

Writes a Sudo-managed /opt/data/config.yaml (overwriting user edits — we are intentionally opinionated). This enables our platforms: (api_server, twilio_whatsapp, sudo_chat, sudo_voice), the toolsets, and memory settings, with the LLM provider pulled from global_settings.llm_provider.
Seeds /opt/data/memories/MEMORY.md with the account identity block.
Heals root-owned .env files from older builds; purges the legacy Baileys WhatsApp dir.
Exec's upstream's tini + entrypoint with gateway run.

Don't override ENTRYPOINT in the Dockerfile

Upstream's entrypoint activates the venv that puts hermes on $PATH. We override only at docker run time (--entrypoint /bin/sh) and then call upstream's entrypoint from the wrapper. Overriding it in the image breaks the agent.

4. Wait for healthy¶

ensure_runtime() blocks on a GET /health poll (default 120s) so callers only get the URL once the agent can actually serve.

Keeping it warm, and reaping it¶

Heartbeat keeps the agent warm

A paired, powered Pi sends a heartbeat ~every 30s, which touches last_active_at and fire-and-forget calls ensure_runtime. So in practice the agent stays warm whenever the home device is on.
The idle reaper stops containers idle for RUNTIME_IDLE_SEC (default 1800s). In practice this only bites when the box is off or unpaired.

Config changes are fleet-wide¶

When an admin saves new LLM credentials at /admin/settings, restart_all_runtimes() does a docker rm -f + respawn + warm-up probe for every active container, so the new key takes effect immediately across all families. LLM config is global and admin-only — there is no per-user key. See Auth model and Secrets.

Where to look in the code¶

Function	Role
`ensure_runtime()`	The entry point everything calls. Spawn-if-needed + wait-healthy.
`_spawn()`	The actual `docker run` with volume, plugins, entrypoint override.
`_SEED_AND_EXEC`	The boot script (a shell string) injected as the entrypoint.
`api_key_for()`	Derives the per-user `API_SERVER_KEY` (HMAC, no storage).
`evict_idle()`	The idle reaper.
`restart_all_runtimes()`	Fleet-wide recreate on admin config save.

All in cloud/api/provisioner.py.