Skip to main content

Troubleshooting

Common issues and solutions across RavenmaskOS.


Quick Diagnostics

# Check all containers
docker ps --format "table {{.Names}}\t{{.Status}}" | sort

# Check disk space
df -h ~/ravenhelm/data

# Check memory
docker stats --no-stream

# Check logs
docker logs --tail 50 <container>

Common Issues

Service Unavailable (502/504)

Symptoms: Browser shows 502 Bad Gateway or 504 Timeout

Diagnosis:

# Check Traefik
docker logs traefik | tail -20

# Check backend
docker ps | grep <service>
docker logs <service> | tail -20

Solutions:

  1. Restart the backend service
  2. Verify Traefik labels are correct
  3. Check service is on ravenhelm_net

Container Crash Loop

Symptoms: Container keeps restarting

Diagnosis:

docker ps -a | grep <service>
docker logs <service>

Solutions:

  1. Check configuration errors
  2. Verify environment variables
  3. Check dependencies are running
  4. Review resource limits

Database Connection Failed

Symptoms: Services can't connect to PostgreSQL

Diagnosis:

docker exec postgres pg_isready
docker logs postgres

Solutions:

  1. Restart PostgreSQL: docker restart postgres
  2. Check credentials in .env
  3. Verify network connectivity

SSO Login Fails

Symptoms: Redirect loop or error after Zitadel login

Diagnosis:

docker logs zitadel | grep -i error
docker logs <service> | grep -i oauth

Solutions:

  1. Verify Client ID/Secret
  2. Check redirect URI matches exactly
  3. Clear browser cookies
  4. Verify Zitadel is healthy

Disk Space Full

Symptoms: Services fail, write errors in logs

Diagnosis:

df -h
du -sh ~/ravenhelm/data/*
docker system df

Solutions:

  1. Clean Docker: docker system prune -a
  2. Clean logs: docker logs --tail 0 <container>
  3. Rotate large data directories
  4. Review retention policies

Certificate Errors

Symptoms: Browser shows certificate warning

Diagnosis:

docker logs traefik | grep -i acme
echo | openssl s_client -servername <domain> -connect <domain>:443 2>/dev/null | openssl x509 -noout -dates

Solutions:

  1. Verify AWS credentials
  2. Check DNS resolution
  3. Force renewal: delete acme.json and restart Traefik

Service-Specific Issues

See individual service pages for detailed troubleshooting:


Memory/Embedding Type Errors

Symptoms: Norns agent fails with TypeError on memory operations

TypeError: Cannot convert Python list to PostgreSQL type

Diagnosis:

# Check if pgvector is registered
ssh ravenhelm@100.115.101.81 "docker logs norns-agent | grep -i pgvector"

# Test embedding query
ssh ravenhelm@100.115.101.81 "docker exec -i postgres psql -U ravenhelm -d ravenmaskos -c 'SELECT COUNT(*) FROM episodic_memories;'"

Solution: Ensure pgvector.asyncpg.register_vector() is called in main.py startup and embeddings are passed as Python lists (not strings):

# CORRECT
embedding = [0.123, 0.456, ...] # Python list
await db.execute("INSERT ... VALUES (:embedding)", {"embedding": embedding})

# INCORRECT
embedding_str = '[' + ','.join(map(str, embedding)) + ']' # Don't do this

See Norns Memory System for details.


OpenFGA Permission Denied

Symptoms: User gets 403 Forbidden on resources they should access

Diagnosis:

# Check authorization tuples
ssh ravenhelm@100.115.101.81 "docker exec -i postgres psql -U ravenhelm -d openfga -c \"
SELECT COUNT(*) FROM tuple WHERE user_object_id = 'user:USERID';
\""

# Check OpenFGA logs
ssh ravenhelm@100.115.101.81 "docker logs openfga | grep -i error"

Solutions:

  1. Verify authorization tuple exists for user-resource relationship
  2. Check user_id matches auth_provider_id from Zitadel (not UUID)
  3. Ensure OpenFGA model ID is correct: 01KE1W3RJH1E13G84N3ERN5XDN

See OpenFGA for details.