Piper

Local text-to-speech (TTS) service for voice synthesis.

Overview

Piper provides neural text-to-speech synthesis for the Voice Platform. It converts text responses from Norns into natural-sounding speech for phone calls and WebRTC sessions.

Container: piper Image: lscr.io/linuxserver/piper:latest Port: 10200 (internal)

Architecture

┌─────────────────┐     ┌─────────────────┐     ┌─────────────────┐
│   Norns Agent   │────▶│      Piper      │────▶│   Voice Agent   │
│  (Text Input)   │     │  (TTS Engine)   │     │ (Audio Output)  │
└─────────────────┘     └─────────────────┘     └─────────────────┘

Configuration

Variable	Value	Description
`PIPER_VOICE`	`en_US-amy-medium`	Voice model
`PIPER_LENGTH`	`1.0`	Speech rate (1.0 = normal)
`PIPER_NOISE`	`0.667`	Phoneme noise (variation)
`PIPER_NOISEW`	`0.333`	Phoneme width noise
`TZ`	`America/Chicago`	Timezone

Voice Models

Current Voice

en_US-amy-medium - American English female voice, medium quality.

Available Voices

Voice	Language	Gender	Quality
`en_US-amy-medium`	English (US)	Female	Medium
`en_US-lessac-medium`	English (US)	Male	Medium
`en_GB-alba-medium`	English (UK)	Female	Medium
`en_US-libritts-high`	English (US)	Various	High

To change voice, update PIPER_VOICE in docker-compose.yml.

API Usage

Synthesize Speech

curl -X POST http://piper:10200/api/tts \
  -H "Content-Type: application/json" \
  -d '{"text": "Hello, how can I help you today?"}' \
  --output speech.wav

Wyoming Protocol

Piper uses the Wyoming protocol for Home Assistant integration:

# Direct synthesis
echo "Hello world" | nc piper 10200 > output.wav

Integration with Voice Platform

Piper is called by the Voice Agent for:

Phone Responses - Synthesizes Norns responses for callers
WebRTC Audio - Generates audio for browser playback
Notifications - Voice alerts and confirmations

Flow

Norns response (text)
        ↓
    Piper (TTS)
        ↓
   Audio (WAV/PCM)
        ↓
LiveKit/Telephony → User hears response

Performance

Latency: ~100-300ms for typical responses
Memory: ~500MB RAM
Quality: Neural network synthesis (natural sounding)
CPU: Optimized for ARM64 (M4)

Tuning Parameters

Speech Rate

PIPER_LENGTH=0.8   # Faster speech
PIPER_LENGTH=1.0   # Normal (default)
PIPER_LENGTH=1.2   # Slower speech

Voice Variation

# More robotic/consistent
PIPER_NOISE=0.3
PIPER_NOISEW=0.1

# More natural/variable (default)
PIPER_NOISE=0.667
PIPER_NOISEW=0.333

# Very expressive
PIPER_NOISE=0.9
PIPER_NOISEW=0.5

Quick Commands

# Check service health
curl -s http://localhost:10200/api/voices | head

# View logs
docker logs -f piper

# Test synthesis
curl -X POST http://piper:10200/api/tts \
  -d '{"text": "Test message"}' \
  --output /tmp/test.wav

Overview​

Architecture​

Configuration​

Voice Models​

Current Voice​

Available Voices​

API Usage​

Synthesize Speech​

Wyoming Protocol​

Integration with Voice Platform​

Flow​

Performance​

Tuning Parameters​

Speech Rate​

Voice Variation​

Quick Commands​

See Also​