Skip to main content

Piper

Local text-to-speech (TTS) service for voice synthesis.


Overview

Piper provides neural text-to-speech synthesis for the Voice Platform. It converts text responses from Norns into natural-sounding speech for phone calls and WebRTC sessions.

Container: piper Image: lscr.io/linuxserver/piper:latest Port: 10200 (internal)


Architecture

┌─────────────────┐     ┌─────────────────┐     ┌─────────────────┐
│ Norns Agent │────▶│ Piper │────▶│ Voice Agent │
│ (Text Input) │ │ (TTS Engine) │ │ (Audio Output) │
└─────────────────┘ └─────────────────┘ └─────────────────┘

Configuration

VariableValueDescription
PIPER_VOICEen_US-amy-mediumVoice model
PIPER_LENGTH1.0Speech rate (1.0 = normal)
PIPER_NOISE0.667Phoneme noise (variation)
PIPER_NOISEW0.333Phoneme width noise
TZAmerica/ChicagoTimezone

Voice Models

Current Voice

en_US-amy-medium - American English female voice, medium quality.

Available Voices

VoiceLanguageGenderQuality
en_US-amy-mediumEnglish (US)FemaleMedium
en_US-lessac-mediumEnglish (US)MaleMedium
en_GB-alba-mediumEnglish (UK)FemaleMedium
en_US-libritts-highEnglish (US)VariousHigh

To change voice, update PIPER_VOICE in docker-compose.yml.


API Usage

Synthesize Speech

curl -X POST http://piper:10200/api/tts \
-H "Content-Type: application/json" \
-d '{"text": "Hello, how can I help you today?"}' \
--output speech.wav

Wyoming Protocol

Piper uses the Wyoming protocol for Home Assistant integration:

# Direct synthesis
echo "Hello world" | nc piper 10200 > output.wav

Integration with Voice Platform

Piper is called by the Voice Agent for:

  1. Phone Responses - Synthesizes Norns responses for callers
  2. WebRTC Audio - Generates audio for browser playback
  3. Notifications - Voice alerts and confirmations

Flow

Norns response (text)

Piper (TTS)

Audio (WAV/PCM)

LiveKit/Telephony → User hears response

Performance

  • Latency: ~100-300ms for typical responses
  • Memory: ~500MB RAM
  • Quality: Neural network synthesis (natural sounding)
  • CPU: Optimized for ARM64 (M4)

Tuning Parameters

Speech Rate

PIPER_LENGTH=0.8   # Faster speech
PIPER_LENGTH=1.0 # Normal (default)
PIPER_LENGTH=1.2 # Slower speech

Voice Variation

# More robotic/consistent
PIPER_NOISE=0.3
PIPER_NOISEW=0.1

# More natural/variable (default)
PIPER_NOISE=0.667
PIPER_NOISEW=0.333

# Very expressive
PIPER_NOISE=0.9
PIPER_NOISEW=0.5

Quick Commands

# Check service health
curl -s http://localhost:10200/api/voices | head

# View logs
docker logs -f piper

# Test synthesis
curl -X POST http://piper:10200/api/tts \
-d '{"text": "Test message"}' \
--output /tmp/test.wav

See Also