Wake-word training¶
The "hey sudo" wake word is detected on-device by a small ONNX classifier. This page
explains how that model is trained and why it's tuned the way it is. Engineering notes:
docs/wake-training.md.
What ships on the device¶
A trained .onnx model lives in the repo at sudoedge/models/hey_sudo.onnx. The Pi runs
it continuously against the mic to spot the wake phrase, locally, with no network round
trip. (The detector keeps running even during TTS so users can interrupt — see
Voice.)
How it's trained¶
We use livekit-wakeword (Apache-2.0) for training. Wake.py forces the ONNX
backend on the Pi because tflite-runtime is dead past cp39, and the Pi venv runs on a
uv-managed Python 3.12.
Tuned for kids in a noisy home¶
The training data and thresholds target the real environment: children's voices, living-room noise, TVs in the background — not a quiet desk. A model that scores well on clean adult speech can still be useless in a family living room, so evaluation uses realistic negatives.
Training infrastructure¶
Heavy synthesis/training runs on a dedicated GPU box (a 7900XTX/ROCm machine), which is shared with other workloads. ROCm-based synthesis is roughly an order of magnitude slower than on Apple Silicon, so plan runs accordingly. (Specific host details are in the team's runbooks/memory, not here.)
Why on-device matters
Wake detection has to be instant and work offline, and it shouldn't stream the whole room's audio to the cloud just to listen for one phrase — privacy and latency both point to running it locally.