Piper
Local text-to-speech (TTS) service for voice synthesis.
Overview
Piper provides neural text-to-speech synthesis for the Voice Platform. It converts text responses from Norns into natural-sounding speech for phone calls and WebRTC sessions.
Container: piper
Image: lscr.io/linuxserver/piper:latest
Port: 10200 (internal)
Architecture
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Norns Agent │────▶│ Piper │────▶│ Voice Agent │
│ (Text Input) │ │ (TTS Engine) │ │ (Audio Output) │
└─────────────────┘ └─────────────────┘ └─────────────────┘
Configuration
| Variable | Value | Description |
|---|---|---|
PIPER_VOICE | en_US-amy-medium | Voice model |
PIPER_LENGTH | 1.0 | Speech rate (1.0 = normal) |
PIPER_NOISE | 0.667 | Phoneme noise (variation) |
PIPER_NOISEW | 0.333 | Phoneme width noise |
TZ | America/Chicago | Timezone |
Voice Models
Current Voice
en_US-amy-medium - American English female voice, medium quality.
Available Voices
| Voice | Language | Gender | Quality |
|---|---|---|---|
en_US-amy-medium | English (US) | Female | Medium |
en_US-lessac-medium | English (US) | Male | Medium |
en_GB-alba-medium | English (UK) | Female | Medium |
en_US-libritts-high | English (US) | Various | High |
To change voice, update PIPER_VOICE in docker-compose.yml.
API Usage
Synthesize Speech
curl -X POST http://piper:10200/api/tts \
-H "Content-Type: application/json" \
-d '{"text": "Hello, how can I help you today?"}' \
--output speech.wav
Wyoming Protocol
Piper uses the Wyoming protocol for Home Assistant integration:
# Direct synthesis
echo "Hello world" | nc piper 10200 > output.wav
Integration with Voice Platform
Piper is called by the Voice Agent for:
- Phone Responses - Synthesizes Norns responses for callers
- WebRTC Audio - Generates audio for browser playback
- Notifications - Voice alerts and confirmations
Flow
Norns response (text)
↓
Piper (TTS)
↓
Audio (WAV/PCM)
↓
LiveKit/Telephony → User hears response
Performance
- Latency: ~100-300ms for typical responses
- Memory: ~500MB RAM
- Quality: Neural network synthesis (natural sounding)
- CPU: Optimized for ARM64 (M4)
Tuning Parameters
Speech Rate
PIPER_LENGTH=0.8 # Faster speech
PIPER_LENGTH=1.0 # Normal (default)
PIPER_LENGTH=1.2 # Slower speech
Voice Variation
# More robotic/consistent
PIPER_NOISE=0.3
PIPER_NOISEW=0.1
# More natural/variable (default)
PIPER_NOISE=0.667
PIPER_NOISEW=0.333
# Very expressive
PIPER_NOISE=0.9
PIPER_NOISEW=0.5
Quick Commands
# Check service health
curl -s http://localhost:10200/api/voices | head
# View logs
docker logs -f piper
# Test synthesis
curl -X POST http://piper:10200/api/tts \
-d '{"text": "Test message"}' \
--output /tmp/test.wav
See Also
- Whisper - Speech-to-text (STT)
- Voice Gateway - Voice processing
- Telephony - Phone integration
- LiveKit - WebRTC infrastructure