Ollama
Local LLM inference server.
Overview
Ollama provides local language model inference for embeddings and generation.
| Property | Value |
|---|---|
| Image | ollama/ollama:latest |
| Container | ollama |
| Port | 11434 (internal) |
| Data | ~/ravenhelm/data/ollama/ |
Installed Models
| Model | Purpose | Size |
|---|---|---|
nomic-embed-text | Embeddings | ~300MB |
llama3.2:3b | Fast inference | ~2GB |
mistral | General tasks | ~4GB |
Quick Commands
# List models
docker exec ollama ollama list
# Pull new model
docker exec ollama ollama pull llama3.2:3b
# Run model interactively
docker exec -it ollama ollama run llama3.2:3b
# View logs
docker logs -f ollama
# Restart
docker restart ollama
API Usage
# Generate text
curl http://ollama:11434/api/generate -d '{
"model": "llama3.2:3b",
"prompt": "Hello, how are you?"
}'
# Generate embeddings
curl http://ollama:11434/api/embeddings -d '{
"model": "nomic-embed-text",
"prompt": "Hello, world!"
}'
Integration with Norns
Norns uses Ollama for:
- Text embeddings (nomic-embed-text)
- Fast local inference for simple tasks
- Fallback when API is unavailable
OLLAMA_URL = "http://ollama:11434"
OLLAMA_EMBED_MODEL = "nomic-embed-text"
Troubleshooting
Issue: Model Not Found
Symptoms: 404 error when calling model
Solutions:
# Pull the model
docker exec ollama ollama pull <model-name>
Issue: Slow Inference
Symptoms: Responses take too long
Solutions:
- Use smaller model
- Check container resources
- Consider GPU acceleration