Runbook: Vidar Discovery Duplicate Key Constraint
Purpose
Resolve duplicate key constraint violations in the Vidar/Bifrost discovery scheduler when syncing entities from multiple sources.
Symptoms
- Discovery scheduler logs show:
duplicate key value violates unique constraint "entities_org_id_type_id_external_id_key" - Docker, Prometheus, or Traefik discovery fails repeatedly
- Entity counts not updating despite running discovery
Prerequisites
- SSH access to odin as ravenhelm
- Access to vidar-api container
- Access to postgres database
Diagnosis
Step 1: Identify the Constraint Error
# Check vidar-api logs for constraint violations
ssh ravenhelm@100.115.101.81 "docker logs vidar-api 2>&1 | grep -i 'duplicate key\|constraint' | tail -10"
Example output:
bifrost.services.discovery_scheduler - ERROR - docker discovery failed: duplicate key value violates unique constraint "entities_org_id_type_id_external_id_key"
DETAIL: Key (org_id, type_id, external_id)=(00000000-0000-0000-0000-000000000001, f16bbc82-8e33-465c-beca-5d58a903f7bb, docker:6fb7b91d5c11) already exists.
Step 2: Understand the Root Cause
The issue occurs when:
- Entity is discovered by Source A and inserted
- Source B discovers the same entity (same external_id)
- Source B's code checks existence filtered by
discovered_by = Source B - Entity not found (was created by Source A), so INSERT attempted
- INSERT fails because unique constraint is on
(org_id, type_id, external_id)- notdiscovered_by
Step 3: Verify Current Code Logic
# Check the entity insertion logic
ssh ravenhelm@100.115.101.81 "grep -A20 'Create new' /Users/ravenhelm/ravenhelm/services/vidar/api/services/discovery/base_discovery.py"
Procedure
Step 1: Update INSERT to Use UPSERT
Modify base_discovery.py to use ON CONFLICT DO UPDATE:
Before:
await conn.execute(
"""
INSERT INTO entities (...)
VALUES ($1, $2, ...)
""",
...
)
stats["created"] += 1
After:
was_inserted = await conn.fetchval(
"""
INSERT INTO entities (
org_id, type_id, name, external_id,
attributes, labels, annotations,
status, health_status, discovered_by
) VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10)
ON CONFLICT (org_id, type_id, external_id) DO UPDATE SET
name = EXCLUDED.name,
attributes = EXCLUDED.attributes,
labels = EXCLUDED.labels,
annotations = EXCLUDED.annotations,
status = EXCLUDED.status,
health_status = EXCLUDED.health_status,
last_seen_at = NOW(),
last_discovery_at = NOW(),
updated_at = NOW()
RETURNING (xmax = 0) as inserted
""",
...
)
if was_inserted:
stats["created"] += 1
else:
stats["updated"] += 1
Step 2: Rebuild and Restart
ssh ravenhelm@100.115.101.81 "cd /Users/ravenhelm/ravenhelm/services/vidar && docker compose build vidar-api && docker compose up -d vidar-api"
Step 3: Verification
# Wait for next discovery run and check logs
sleep 30
ssh ravenhelm@100.115.101.81 "docker logs vidar-api --tail 30 2>&1 | grep -E 'discovery:|constraint'"
Success criteria:
- Discovery runs show:
discovered=X, created=Y, updated=Z - No duplicate key constraint errors
- All discovery sources complete successfully
Technical Details
The fix uses PostgreSQL's ON CONFLICT DO UPDATE (UPSERT) with RETURNING (xmax = 0):
xmax = 0returnsTRUEif row was inserted (new)xmax != 0returnsFALSEif row was updated (existing)
This allows accurate statistics tracking while preventing constraint violations.
Rollback
# Revert code changes
ssh ravenhelm@100.115.101.81 "cd /Users/ravenhelm/ravenhelm/services/vidar && git checkout api/services/discovery/base_discovery.py"
# Rebuild
ssh ravenhelm@100.115.101.81 "cd /Users/ravenhelm/ravenhelm/services/vidar && docker compose build vidar-api && docker compose up -d vidar-api"
Prevention
- Use UPSERT patterns when multiple sources may discover the same entity
- Ensure unique constraints align with existence checks
- Consider using database-level conflict handling over application logic
Escalation
If rollback fails or issues persist:
- Check for database corruption or orphaned records
- Review discovery source configurations
- Contact: Platform team