Runbook: Norns Agent Troubleshooting
Overview
- What: Troubleshoot Norns LangGraph agent issues
- When: Norns services showing errors, agents not responding, or memory/execution issues
- Duration: 10-45 minutes
- Services: norns-api, norns-executor, norns-admin, norns-agent, norns-ui, norns-pm
Prerequisites
- SSH access to odin as ravenhelm
- Access to Grafana/Langfuse for tracing
- GitLab repo: https://gitlab.ravenhelm.dev/agents/norns
Quick Health Check
ssh ravenhelm@100.115.101.81
# Check all Norns containers
docker ps | grep norns
# Quick API health
curl -s https://norns.ravenhelm.dev/health | jq
Common Issues
Issue 1: Agent Not Responding
Symptoms: API returns 502/504, agent timeouts
# Check executor status
docker logs --tail 100 norns-executor
# Check for stuck tasks
docker exec -i postgres psql -U ravenhelm -d ravenmaskos -c \
"SELECT task_id, status, created_at FROM turn_tasks WHERE status = 'in_progress' ORDER BY created_at DESC LIMIT 10"
# Restart executor if stuck
cd ~/ravenhelm/docs/AI-ML-Platform/norns-agent
docker compose restart norns-executor
Issue 2: Memory System Errors
Symptoms: Agent forgetting context, memory retrieval failures
# Check memory tables
docker exec -i postgres psql -U ravenhelm -d ravenmaskos -c \
"SELECT count(*) FROM episodic_memories WHERE created_at > now() - interval '1 hour'"
# Check for embedding errors
docker logs norns-api 2>&1 | grep -i 'embedding\|memory' | tail -20
# Verify Ollama is running (for local embeddings)
docker ps | grep ollama
curl -s http://localhost:11434/api/tags | jq '.models[].name'
Issue 3: Tool Execution Failures
Symptoms: Tools not executing, bifrost connection errors
# Check bifrost connectivity
curl -s https://bifrost.ravenhelm.dev/health | jq
# Check tool execution logs
docker logs norns-executor 2>&1 | grep -i 'tool\|bifrost\|error' | tail -30
# Verify MCP connections
docker exec -i postgres psql -U ravenhelm -d ravenmaskos -c \
"SELECT name, status FROM mcp_server_connections"
Issue 4: LLM API Errors
Symptoms: 429 rate limits, API key errors, model not found
# Check for API errors
docker logs norns-executor 2>&1 | grep -iE 'openai|anthropic|429|401|rate' | tail -20
# Verify API key is set
docker exec norns-executor env | grep -i 'OPENAI\|ANTHROPIC'
# Check Langfuse for traces
# Visit: https://langfuse.ravenhelm.dev
Issue 5: Database Connection Issues
Symptoms: Connection refused, too many connections
# Check postgres connections
docker exec -i postgres psql -U ravenhelm -d ravenmaskos -c \
"SELECT count(*) FROM pg_stat_activity WHERE datname = 'ravenmaskos'"
# Check for connection errors
docker logs norns-api 2>&1 | grep -i 'connection\|postgres\|database' | tail -20
# Restart if needed
cd ~/ravenhelm/docs/AI-ML-Platform/norns-agent
docker compose restart
Full Service Restart
cd ~/ravenhelm/docs/AI-ML-Platform/norns-agent
# Graceful restart
docker compose restart
# Full recreate (if issues persist)
docker compose down
docker compose up -d
# Watch logs
docker compose logs -f --tail 50
Verification
# Test API endpoint
curl -s https://norns.ravenhelm.dev/health | jq
# Test agent interaction (via UI)
# Visit: https://norns.ravenhelm.dev
# Check Langfuse for new traces
# Visit: https://langfuse.ravenhelm.dev
Escalation
If issues persist:
- Check Langfuse traces for detailed error context
- Review recent changes in GitLab: https://gitlab.ravenhelm.dev/agents/norns
- Check if external APIs (OpenAI, Anthropic) have outages