Voice¶
Voice is the flagship surface: a family talks to the Pi in their home and hears a reply. This page covers the pipeline, the wake word, and the proactive path. For the hop-by-hop trace see A request, end to end.
The pipeline¶
The Pi runs sudoedge, which does only audio + wake detection. All STT/LLM/TTS
happens in the cloud, in voice-bridge (a livekit-agents worker). The Pi and
voice-bridge meet inside a LiveKit room named room_<user_id>.
Wake word: "hey sudo"¶
- Detected on-device by a small ONNX model at
sudoedge/models/hey_sudo.onnx. - The detector keeps running even while the agent is speaking, so you can interrupt it — but that creates two hazards the code handles: not self-triggering on the agent's own audio, and not letting the voice-activity detector trip on speaker bleed-through.
- Training the model is its own topic — see Wake-word training.
Tuned for kids in a noisy home
The device is for families, including children, in real living rooms — not a quiet developer's desk. Wake sensitivity and turn-taking are tuned for that, not for clean studio speech.
The voice is Indic by default¶
In production the voice stack is Hindi via Sarvam (STT saaras:v3, TTS bulbul:v3),
not English. This matters: any language-specific component (endpointing models,
turn-detection, wake tuning) must match the configured language.
Don't default-enable English turn-detection/endpointing
Turn-detection and endpointing models are opt-in and language-matched. An English end-of-utterance model was once enabled by default and broke the live Indic setup. If you touch turn-taking, gate it behind config and match the language.
Proactive voice¶
The agent can speak unprompted — a cron reminder, a send_message. That goes through
the sudo_voice plugin:
voice-bridge looks up the active AgentSession by room name and cross-thread-dispatches
session.say(text) onto the agents loop. If the device is offline there's no session, so
it returns 404 and the agent can choose WhatsApp instead.
Where to look¶
| Concern | File |
|---|---|
| Voice worker (STT/hermes/TTS, session mgmt) | cloud/voice_bridge/main.py |
| Pi-side LiveKit client + cues | sudoedge/lk_client.py |
| Wake detection | sudoedge/wake.py, sudoedge/models/hey_sudo.onnx |
| Audio device selection | sudoedge/audio_devices.py (see Audio devices) |
For the original engineering notes, see docs/livekit-setup.md, docs/wake-interrupt.md,
and docs/voice-bridge-sse.md in the repo.