Skip to main content

Runbook: Norns Agent Troubleshooting

Overview

  • What: Troubleshoot Norns LangGraph agent issues
  • When: Norns services showing errors, agents not responding, or memory/execution issues
  • Duration: 10-45 minutes
  • Services: norns-api, norns-executor, norns-admin, norns-agent, norns-ui, norns-pm

Prerequisites

Quick Health Check

ssh ravenhelm@100.115.101.81

# Check all Norns containers
docker ps | grep norns

# Quick API health
curl -s https://norns.ravenhelm.dev/health | jq

Common Issues

Issue 1: Agent Not Responding

Symptoms: API returns 502/504, agent timeouts

# Check executor status
docker logs --tail 100 norns-executor

# Check for stuck tasks
docker exec -i postgres psql -U ravenhelm -d ravenmaskos -c \
"SELECT task_id, status, created_at FROM turn_tasks WHERE status = 'in_progress' ORDER BY created_at DESC LIMIT 10"

# Restart executor if stuck
cd ~/ravenhelm/docs/AI-ML-Platform/norns-agent
docker compose restart norns-executor

Issue 2: Memory System Errors

Symptoms: Agent forgetting context, memory retrieval failures

# Check memory tables
docker exec -i postgres psql -U ravenhelm -d ravenmaskos -c \
"SELECT count(*) FROM episodic_memories WHERE created_at > now() - interval '1 hour'"

# Check for embedding errors
docker logs norns-api 2>&1 | grep -i 'embedding\|memory' | tail -20

# Verify Ollama is running (for local embeddings)
docker ps | grep ollama
curl -s http://localhost:11434/api/tags | jq '.models[].name'

Issue 3: Tool Execution Failures

Symptoms: Tools not executing, bifrost connection errors

# Check bifrost connectivity
curl -s https://bifrost.ravenhelm.dev/health | jq

# Check tool execution logs
docker logs norns-executor 2>&1 | grep -i 'tool\|bifrost\|error' | tail -30

# Verify MCP connections
docker exec -i postgres psql -U ravenhelm -d ravenmaskos -c \
"SELECT name, status FROM mcp_server_connections"

Issue 4: LLM API Errors

Symptoms: 429 rate limits, API key errors, model not found

# Check for API errors
docker logs norns-executor 2>&1 | grep -iE 'openai|anthropic|429|401|rate' | tail -20

# Verify API key is set
docker exec norns-executor env | grep -i 'OPENAI\|ANTHROPIC'

# Check Langfuse for traces
# Visit: https://langfuse.ravenhelm.dev

Issue 5: Database Connection Issues

Symptoms: Connection refused, too many connections

# Check postgres connections
docker exec -i postgres psql -U ravenhelm -d ravenmaskos -c \
"SELECT count(*) FROM pg_stat_activity WHERE datname = 'ravenmaskos'"

# Check for connection errors
docker logs norns-api 2>&1 | grep -i 'connection\|postgres\|database' | tail -20

# Restart if needed
cd ~/ravenhelm/docs/AI-ML-Platform/norns-agent
docker compose restart

Full Service Restart

cd ~/ravenhelm/docs/AI-ML-Platform/norns-agent

# Graceful restart
docker compose restart

# Full recreate (if issues persist)
docker compose down
docker compose up -d

# Watch logs
docker compose logs -f --tail 50

Verification

# Test API endpoint
curl -s https://norns.ravenhelm.dev/health | jq

# Test agent interaction (via UI)
# Visit: https://norns.ravenhelm.dev

# Check Langfuse for new traces
# Visit: https://langfuse.ravenhelm.dev

Escalation

If issues persist:

  1. Check Langfuse traces for detailed error context
  2. Review recent changes in GitLab: https://gitlab.ravenhelm.dev/agents/norns
  3. Check if external APIs (OpenAI, Anthropic) have outages