Skip to main content

Ollama

Local LLM inference server.


Overview

Ollama provides local language model inference for embeddings and generation.

PropertyValue
Imageollama/ollama:latest
Containerollama
Port11434 (internal)
Data~/ravenhelm/data/ollama/

Installed Models

ModelPurposeSize
nomic-embed-textEmbeddings~300MB
llama3.2:3bFast inference~2GB
mistralGeneral tasks~4GB

Quick Commands

# List models
docker exec ollama ollama list

# Pull new model
docker exec ollama ollama pull llama3.2:3b

# Run model interactively
docker exec -it ollama ollama run llama3.2:3b

# View logs
docker logs -f ollama

# Restart
docker restart ollama

API Usage

# Generate text
curl http://ollama:11434/api/generate -d '{
"model": "llama3.2:3b",
"prompt": "Hello, how are you?"
}'

# Generate embeddings
curl http://ollama:11434/api/embeddings -d '{
"model": "nomic-embed-text",
"prompt": "Hello, world!"
}'

Integration with Norns

Norns uses Ollama for:

  • Text embeddings (nomic-embed-text)
  • Fast local inference for simple tasks
  • Fallback when API is unavailable
OLLAMA_URL = "http://ollama:11434"
OLLAMA_EMBED_MODEL = "nomic-embed-text"

Troubleshooting

Issue: Model Not Found

Symptoms: 404 error when calling model

Solutions:

# Pull the model
docker exec ollama ollama pull <model-name>

Issue: Slow Inference

Symptoms: Responses take too long

Solutions:

  1. Use smaller model
  2. Check container resources
  3. Consider GPU acceleration