Skip to content

Wake-word training

The "hey sudo" wake word is detected on-device by a small ONNX classifier. This page explains how that model is trained and why it's tuned the way it is. Engineering notes: docs/wake-training.md.

What ships on the device

A trained .onnx model lives in the repo at sudoedge/models/hey_sudo.onnx. The Pi runs it continuously against the mic to spot the wake phrase, locally, with no network round trip. (The detector keeps running even during TTS so users can interrupt — see Voice.)

How it's trained

Wake-word training pipeline

We use livekit-wakeword (Apache-2.0) for training. Wake.py forces the ONNX backend on the Pi because tflite-runtime is dead past cp39, and the Pi venv runs on a uv-managed Python 3.12.

Tuned for kids in a noisy home

The training data and thresholds target the real environment: children's voices, living-room noise, TVs in the background — not a quiet desk. A model that scores well on clean adult speech can still be useless in a family living room, so evaluation uses realistic negatives.

Training infrastructure

Heavy synthesis/training runs on a dedicated GPU box (a 7900XTX/ROCm machine), which is shared with other workloads. ROCm-based synthesis is roughly an order of magnitude slower than on Apple Silicon, so plan runs accordingly. (Specific host details are in the team's runbooks/memory, not here.)

Why on-device matters

Wake detection has to be instant and work offline, and it shouldn't stream the whole room's audio to the cloud just to listen for one phrase — privacy and latency both point to running it locally.