A request, end to end¶
Let's follow real requests through every hop, with the actual endpoints and tokens. If you understand these three traces, you understand the system's runtime.
Voice (reactive): "hey sudo, what's the weather?"¶
Key points:
- The Pi never talks to hermes directly. It only does audio, over WebRTC, to LiveKit.
- voice-bridge is the orchestrator for voice: STT, the hermes call, TTS.
ensure_runtimeis what makes the family's agent exist before the first call. See Per-user provisioning.- The bearer to hermes is
API_SERVER_KEY = HMAC(JWT_SECRET, user_id)— derived, never stored. See Auth model.
Chat (reactive): typing in the browser¶
Key points:
- Chat uses two connections: one long-lived SSE stream for output, and short
fire-and-forget POSTs for input. The token rides on the query string because
EventSourcecan't set headers. - The plugin pushes tokens back to
sudo-api, which fans them out to every open tab for that user — multi-tab just works, and so do proactive messages (they appear without you typing first).
WhatsApp (reactive): messaging the shared number¶
Key points:
- One shared Twilio number serves every family. The mapping from a phone number to a
family account is
public.family_members. - The webhook is public but authenticated by Twilio's signature, verified inside
sudo-apibefore anything is forwarded.
Proactive (any surface): the agent reaches out¶
When a cronjob fires or the agent calls send_message, the flow originates inside
hermes and goes outward through the matching plugin:
Voice proactive can fail gracefully
If sudo_voice POSTs to voice-bridge and there's no active session (the device is
offline), voice-bridge returns 404. The agent can see that and fall back to another
surface — e.g. send the reminder by WhatsApp instead.
Next: Per-user provisioning — how a family's agent container comes into existence in the first place.