Skip to main content

Whisper

Local speech-to-text (STT) service using OpenAI's Whisper model.


Overview

Whisper provides automatic speech recognition for the Voice Platform. It transcribes audio from voice calls and WebRTC sessions into text for processing by Norns.

Container: whisper Image: onerahmet/openai-whisper-asr-webservice:latest Port: 9000 (internal)


Architecture

┌─────────────────┐     ┌─────────────────┐     ┌─────────────────┐
│ Voice Agent │────▶│ Whisper │────▶│ Norns Agent │
│ (Audio Input) │ │ (STT Engine) │ │ (Text Output) │
└─────────────────┘ └─────────────────┘ └─────────────────┘

Configuration

VariableValueDescription
ASR_ENGINEfaster_whisperUses faster-whisper for improved performance
ASR_MODELbase.enEnglish-optimized base model

Model Options

ModelSizeSpeedAccuracyUse Case
tiny.en39MBFastestGoodReal-time, low latency
base.en74MBFastBetterCurrent (balanced)
small.en244MBMediumHighAccurate transcription
medium.en769MBSlowHigherHigh accuracy needs

API Usage

Transcribe Audio

curl -X POST http://whisper:9000/asr \
-H "Content-Type: multipart/form-data" \
-F "audio_file=@audio.wav" \
-F "task=transcribe" \
-F "language=en"

Response

{
"text": "Hello, this is a test transcription."
}

Parameters

ParameterTypeDescription
audio_filefileAudio file (WAV, MP3, FLAC, etc.)
taskstringtranscribe or translate
languagestringLanguage code (e.g., en)
outputstringjson, text, srt, vtt

Integration with Voice Platform

Whisper is called by the Voice Agent for:

  1. Phone Calls - Transcribes caller speech via Telephony
  2. WebRTC - Transcribes browser-based voice input
  3. Voice Commands - Converts speech to actionable text

Flow

User speaks → LiveKit/Telephony → Audio chunk

Whisper (STT)

Text transcript

Norns Agent → Response

Piper (TTS) → Audio response

Performance

  • Latency: ~200-500ms for short utterances (base.en model)
  • Memory: ~1GB RAM
  • GPU: Not required (CPU inference on M4)

Quick Commands

# Check service health
docker exec whisper curl -s http://localhost:9000/health

# View logs
docker logs -f whisper

# Test transcription
docker exec whisper curl -s http://localhost:9000/asr \
-F "audio_file=@/tmp/test.wav"

See Also