Skip to main content

Runbook: Vidar Discovery Duplicate Key Constraint

Purpose

Resolve duplicate key constraint violations in the Vidar/Bifrost discovery scheduler when syncing entities from multiple sources.

Symptoms

  • Discovery scheduler logs show: duplicate key value violates unique constraint "entities_org_id_type_id_external_id_key"
  • Docker, Prometheus, or Traefik discovery fails repeatedly
  • Entity counts not updating despite running discovery

Prerequisites

  • SSH access to odin as ravenhelm
  • Access to vidar-api container
  • Access to postgres database

Diagnosis

Step 1: Identify the Constraint Error

# Check vidar-api logs for constraint violations
ssh ravenhelm@100.115.101.81 "docker logs vidar-api 2>&1 | grep -i 'duplicate key\|constraint' | tail -10"

Example output:

bifrost.services.discovery_scheduler - ERROR - docker discovery failed: duplicate key value violates unique constraint "entities_org_id_type_id_external_id_key"
DETAIL: Key (org_id, type_id, external_id)=(00000000-0000-0000-0000-000000000001, f16bbc82-8e33-465c-beca-5d58a903f7bb, docker:6fb7b91d5c11) already exists.

Step 2: Understand the Root Cause

The issue occurs when:

  1. Entity is discovered by Source A and inserted
  2. Source B discovers the same entity (same external_id)
  3. Source B's code checks existence filtered by discovered_by = Source B
  4. Entity not found (was created by Source A), so INSERT attempted
  5. INSERT fails because unique constraint is on (org_id, type_id, external_id) - not discovered_by

Step 3: Verify Current Code Logic

# Check the entity insertion logic
ssh ravenhelm@100.115.101.81 "grep -A20 'Create new' /Users/ravenhelm/ravenhelm/services/vidar/api/services/discovery/base_discovery.py"

Procedure

Step 1: Update INSERT to Use UPSERT

Modify base_discovery.py to use ON CONFLICT DO UPDATE:

Before:

await conn.execute(
"""
INSERT INTO entities (...)
VALUES ($1, $2, ...)
""",
...
)
stats["created"] += 1

After:

was_inserted = await conn.fetchval(
"""
INSERT INTO entities (
org_id, type_id, name, external_id,
attributes, labels, annotations,
status, health_status, discovered_by
) VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10)
ON CONFLICT (org_id, type_id, external_id) DO UPDATE SET
name = EXCLUDED.name,
attributes = EXCLUDED.attributes,
labels = EXCLUDED.labels,
annotations = EXCLUDED.annotations,
status = EXCLUDED.status,
health_status = EXCLUDED.health_status,
last_seen_at = NOW(),
last_discovery_at = NOW(),
updated_at = NOW()
RETURNING (xmax = 0) as inserted
""",
...
)
if was_inserted:
stats["created"] += 1
else:
stats["updated"] += 1

Step 2: Rebuild and Restart

ssh ravenhelm@100.115.101.81 "cd /Users/ravenhelm/ravenhelm/services/vidar && docker compose build vidar-api && docker compose up -d vidar-api"

Step 3: Verification

# Wait for next discovery run and check logs
sleep 30
ssh ravenhelm@100.115.101.81 "docker logs vidar-api --tail 30 2>&1 | grep -E 'discovery:|constraint'"

Success criteria:

  • Discovery runs show: discovered=X, created=Y, updated=Z
  • No duplicate key constraint errors
  • All discovery sources complete successfully

Technical Details

The fix uses PostgreSQL's ON CONFLICT DO UPDATE (UPSERT) with RETURNING (xmax = 0):

  • xmax = 0 returns TRUE if row was inserted (new)
  • xmax != 0 returns FALSE if row was updated (existing)

This allows accurate statistics tracking while preventing constraint violations.

Rollback

# Revert code changes
ssh ravenhelm@100.115.101.81 "cd /Users/ravenhelm/ravenhelm/services/vidar && git checkout api/services/discovery/base_discovery.py"

# Rebuild
ssh ravenhelm@100.115.101.81 "cd /Users/ravenhelm/ravenhelm/services/vidar && docker compose build vidar-api && docker compose up -d vidar-api"

Prevention

  • Use UPSERT patterns when multiple sources may discover the same entity
  • Ensure unique constraints align with existence checks
  • Consider using database-level conflict handling over application logic

Escalation

If rollback fails or issues persist:

  1. Check for database corruption or orphaned records
  2. Review discovery source configurations
  3. Contact: Platform team