Skip to content

The three surfaces

Everything a user can do passes through one of three doorways. They look different on the outside but converge on the same per-user hermes agent. This page is the map; each surface has its own deep-dive under The surfaces.

Three surfaces, one agent

Reactive vs proactive — the key distinction

Every surface works in two directions, and it's worth getting this straight early:

  • Reactive — a user sends something; the agent replies. (You talk, it answers.)
  • Proactive — the agent starts the conversation. A scheduled cronjob(deliver=…) fires, or the agent decides to send_message(target=…). (It pings you.)

Because hermes treats each surface as a first-class platform, the agent can choose which surface to speak out of. It might answer a WhatsApp question by WhatsApp, but deliver a morning reminder by voice on the Pi.

At a glance

Voice Chat WhatsApp
User device Pi in the home Web browser Phone (WhatsApp app)
Transport in WebRTC (LiveKit) → voice-bridge POST /v1/me/chat/turn Twilio webhook → sudo-api
Transport out TTS audio over LiveKit SSE stream to browser Twilio Messages REST
Reactive path hits api_server :8642 (via voice-bridge) sudo_chat plugin :8652 twilio_whatsapp plugin :8651
Proactive path sudo_voice plugin → voice-bridge :18087 sudo_chat → SSE fan-out twilio_whatsapp → Twilio REST
Who authenticates the user LiveKit token + device JWT Supabase JWT X-Twilio-Signature + phone lookup
Deep dive Voice Chat WhatsApp

How each one reaches the agent (reactive)

A reactive turn — the same shape across all three surfaces

The middle three steps are identical in spirit across surfaces; only the first and last "shapes" differ (audio vs SSE vs WhatsApp message). That's the whole trick — one brain, three skins.

Why plugins instead of forking hermes

hermes has a documented plugin path: drop code in /opt/data/plugins/ and the loader discovers it. We bind-mount three plugins into every per-user container:

  • sudo_chat — the browser chat adapter.
  • sudo_voice — the proactive-voice adapter (lets the agent speak unprompted).
  • twilio_whatsapp — the WhatsApp adapter.

Because each is a real hermes platform, the agent gets native send_message(target=…) and cronjob(deliver=…) for it — no special-casing in our code. See Adding a platform plugin to see how this extension point works.

Next: A request, end to end traces one voice turn through every hop with the real endpoints.