Runbook: Vidar Database Column Mismatch
Purpose
Resolve 500 Internal Server Error on Vidar API endpoints caused by database column name mismatches between code and schema.
Symptoms
- API endpoints return 500 Internal Server Error
- Logs show:
- Specific endpoints fail while others work normally
Prerequisites
- SSH access to odin as ravenhelm
- Access to vidar-api container
- Access to postgres database
Diagnosis
Step 1: Identify the Missing Column
Example output:
Step 2: Check Database Schema
Step 3: Find Code Reference
/Users/ravenhelm/ravenhelm/services/vidar/api/routes/streams.py:106: i.severity, i.priority, i.started_at, i.resolved_at,
/Users/ravenhelm/ravenhelm/services/vidar/api/routes/streams.py:133: "started_at": row["started_at"].isoformat() if row["started_at"] else None,
/Users/ravenhelm/ravenhelm/services/vidar/api/routes/streams.py:166: r.final_decision, r.started_at, r.completed_at, r.duration_ms,
/Users/ravenhelm/ravenhelm/services/vidar/api/routes/streams.py:170: ORDER BY r.started_at DESC
/Users/ravenhelm/ravenhelm/services/vidar/api/routes/streams.py:186: WHERE started_at >= NOW() - INTERVAL '24 hours'
/Users/ravenhelm/ravenhelm/services/vidar/api/routes/streams.py:194: r.run_id, r.started_at,
/Users/ravenhelm/ravenhelm/services/vidar/api/routes/streams.py:196: EXTRACT(EPOCH FROM (NOW() - r.started_at)) as running_seconds
/Users/ravenhelm/ravenhelm/services/vidar/api/routes/streams.py:200: ORDER BY r.started_at ASC
/Users/ravenhelm/ravenhelm/services/vidar/api/routes/streams.py:217: "started_at": row["started_at"].isoformat() if row["started_at"] else None,
/Users/ravenhelm/ravenhelm/services/vidar/api/routes/streams.py:237: "started_at": row["started_at"].isoformat() if row["started_at"] else None,
/Users/ravenhelm/ravenhelm/services/vidar/api/routes/cmdb.py:98: started_at: str
/Users/ravenhelm/ravenhelm/services/vidar/api/routes/cmdb.py:466: SELECT r.run_id, s.name as source_name, r.trigger_type, r.started_at,
/Users/ravenhelm/ravenhelm/services/vidar/api/routes/cmdb.py:481: query += f" ORDER BY r.started_at DESC LIMIT ${param_idx}"
/Users/ravenhelm/ravenhelm/services/vidar/api/routes/cmdb.py:492: started_at=row["started_at"].isoformat() if row["started_at"] else None,
/Users/ravenhelm/ravenhelm/services/vidar/api/routes/cmdb.py:689: (SELECT r.success FROM discovery_runs r WHERE r.source_id = s.source_id ORDER BY r.started_at DESC LIMIT 1) as last_run_success,
/Users/ravenhelm/ravenhelm/services/vidar/api/routes/cmdb.py:690: (SELECT r.errors FROM discovery_runs r WHERE r.source_id = s.source_id ORDER BY r.started_at DESC LIMIT 1) as last_error
/Users/ravenhelm/ravenhelm/services/vidar/api/routes/cmdb.py:700: SELECT r.run_id, s.name as source_name, r.trigger_type, r.started_at,
/Users/ravenhelm/ravenhelm/services/vidar/api/routes/cmdb.py:706: ORDER BY r.started_at DESC
/Users/ravenhelm/ravenhelm/services/vidar/api/routes/cmdb.py:744: "started_at": r["started_at"].isoformat() if r["started_at"] else None,
/Users/ravenhelm/ravenhelm/services/vidar/api/routes/agent.py:36: started_at: datetime
/Users/ravenhelm/ravenhelm/services/vidar/api/routes/agent.py:47: started_at: datetime
/Users/ravenhelm/ravenhelm/services/vidar/api/routes/agent.py:117: r.final_decision, r.started_at, r.completed_at, r.duration_ms,
/Users/ravenhelm/ravenhelm/services/vidar/api/routes/agent.py:122: ORDER BY r.started_at DESC
/Users/ravenhelm/ravenhelm/services/vidar/api/routes/agent.py:138: started_at=row['started_at'],
/Users/ravenhelm/ravenhelm/services/vidar/api/routes/agent.py:153: started_at, completed_at, duration_ms,
/Users/ravenhelm/ravenhelm/services/vidar/api/routes/agent.py:176: started_at=row['started_at'],
/Users/ravenhelm/ravenhelm/services/vidar/api/routes/agent.py:208: WHERE started_at >= NOW() - ($1 || ' hours')::INTERVAL
/Users/ravenhelm/ravenhelm/services/vidar/api/routes/agent.py:235: r.run_id, r.alert_id, r.started_at,
/Users/ravenhelm/ravenhelm/services/vidar/api/routes/agent.py:237: EXTRACT(EPOCH FROM (NOW() - r.started_at)) as running_seconds
/Users/ravenhelm/ravenhelm/services/vidar/api/routes/agent.py:241: ORDER BY r.started_at ASC
/Users/ravenhelm/ravenhelm/services/vidar/api/routes/agent.py:251: "started_at": row['started_at'].isoformat(),
/Users/ravenhelm/ravenhelm/services/vidar/api/routes/agent.py:345: AND r.started_at > NOW() - INTERVAL '1 hour'
/Users/ravenhelm/ravenhelm/services/vidar/api/routes/incidents.py:51: started_at: datetime
/Users/ravenhelm/ravenhelm/services/vidar/api/routes/incidents.py:266: a.starts_at as started_at,
/Users/ravenhelm/ravenhelm/services/vidar/api/services/agent_scheduler.py:106: AND r.started_at > NOW() - INTERVAL '{self.cooldown_minutes} minutes'
/Users/ravenhelm/ravenhelm/services/vidar/api/services/discovery/base_discovery.py:88: (run_id, source_id, org_id, trigger_type, started_at)
/Users/ravenhelm/ravenhelm/services/vidar/api/services/aiops/workflow_orchestrator.py:74: started_at = datetime.utcnow()
/Users/ravenhelm/ravenhelm/services/vidar/api/services/aiops/workflow_orchestrator.py:82: n8n_connection_id, n8n_workflow_id, input_data, status, started_at
/Users/ravenhelm/ravenhelm/services/vidar/api/services/aiops/workflow_orchestrator.py:92: started_at,
/Users/ravenhelm/ravenhelm/services/vidar/api/services/aiops/workflow_orchestrator.py:108: duration_ms = EXTRACT(EPOCH FROM (NOW() - started_at)) * 1000
/Users/ravenhelm/ravenhelm/services/vidar/api/services/aiops/workflow_orchestrator.py:127: duration_ms = EXTRACT(EPOCH FROM (NOW() - started_at)) * 1000
/Users/ravenhelm/ravenhelm/services/vidar/api/services/aiops/workflow_orchestrator.py:241: started_at, completed_at, duration_ms
/Users/ravenhelm/ravenhelm/services/vidar/api/services/aiops/workflow_orchestrator.py:258: "started_at": row["started_at"].isoformat() if row["started_at"] else None,
/Users/ravenhelm/ravenhelm/services/vidar/api/services/aiops/workflow_orchestrator.py:273: started_at, completed_at, duration_ms
/Users/ravenhelm/ravenhelm/services/vidar/api/services/aiops/workflow_orchestrator.py:290: query += f" ORDER BY started_at DESC LIMIT ${param_idx}"
/Users/ravenhelm/ravenhelm/services/vidar/api/services/aiops/workflow_orchestrator.py:304: "started_at": row["started_at"].isoformat() if row["started_at"] else None,
/Users/ravenhelm/ravenhelm/services/vidar/api/services/aiops/sre_orchestrator.py:151: duration_ms = EXTRACT(EPOCH FROM (NOW() - started_at)) * 1000,
/Users/ravenhelm/ravenhelm/services/vidar/api/services/aiops/sre_orchestrator.py:185: duration_ms = EXTRACT(EPOCH FROM (NOW() - started_at)) * 1000,
/Users/ravenhelm/ravenhelm/services/vidar/api/services/aiops/sre_orchestrator.py:205: run_id, status, started_at, completed_at,
/Users/ravenhelm/ravenhelm/services/vidar/api/services/aiops/sre_orchestrator.py:220: "started_at": row['started_at'].isoformat() if row['started_at'] else None,
/Users/ravenhelm/ravenhelm/services/vidar/api/services/aiops/sre_orchestrator.py:241: r.run_id, r.status, r.started_at, r.completed_at,
/Users/ravenhelm/ravenhelm/services/vidar/api/services/aiops/sre_orchestrator.py:247: ORDER BY r.started_at DESC
/Users/ravenhelm/ravenhelm/services/vidar/api/services/aiops/sre_orchestrator.py:257: r.run_id, r.status, r.started_at, r.completed_at,
/Users/ravenhelm/ravenhelm/services/vidar/api/services/aiops/sre_orchestrator.py:262: ORDER BY r.started_at DESC
/Users/ravenhelm/ravenhelm/services/vidar/api/services/aiops/sre_orchestrator.py:273: "started_at": row['started_at'].isoformat() if row['started_at'] else None,
Procedure
Step 1: Identify the Correct Column Name
Compare the error message column with actual database schema:
- Error shows:
- Database has:
Step 2: Fix the Code
Step 3: Rebuild and Restart
Sending build context to Docker daemon 98.64kB
Step 1/8 : FROM python:3.11-slim
---> 1dd3dca85e22
Step 2/8 : WORKDIR /app
---> Using cache
---> eee451907a0d
Step 3/8 : RUN apt-get update && apt-get install -y --no-install-recommends curl && rm -rf /var/lib/apt/lists/*
---> Using cache
---> 8fa0da8edb07
Step 4/8 : COPY requirements.txt .
---> Using cache
---> e93da2963f1d
Step 5/8 : RUN pip install --no-cache-dir -r requirements.txt
---> Using cache
---> 9fbe8855f2bb
Step 6/8 : COPY . .
---> Using cache
---> be7d6ef140a3
Step 7/8 : CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]
---> Using cache
---> d35190413b81
Step 8/8 : LABEL com.docker.compose.image.builder=classic
---> Using cache
---> af805a5de6c6
Successfully built af805a5de6c6
Successfully tagged vidar-vidar-api:latest
Step 4: Verification
null
Success criteria:
- Endpoint returns 200 OK with data
- No UndefinedColumnError in logs
Common Column Mismatches
| Table | Code Uses | Database Has |
|---|---|---|
| aiops_alert_instances | started_at | starts_at |
| aiops_alert_instances | ended_at | ends_at |
Rollback
If the fix causes issues:
Sending build context to Docker daemon 97.29kB
Step 1/8 : FROM python:3.11-slim
---> 1dd3dca85e22
Step 2/8 : WORKDIR /app
---> Using cache
---> eee451907a0d
Step 3/8 : RUN apt-get update && apt-get install -y --no-install-recommends curl && rm -rf /var/lib/apt/lists/*
---> Using cache
---> 8fa0da8edb07
Step 4/8 : COPY requirements.txt .
---> Using cache
---> e93da2963f1d
Step 5/8 : RUN pip install --no-cache-dir -r requirements.txt
---> Using cache
---> 9fbe8855f2bb
Step 6/8 : COPY . .
---> 828a033a962e
Step 7/8 : CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]
---> Running in 288f310fdcab
---> Removed intermediate container 288f310fdcab
---> 2e31e8857ca8
Step 8/8 : LABEL com.docker.compose.image.builder=classic
---> Running in 08b47c136ba6
---> Removed intermediate container 08b47c136ba6
---> 6d8f3a1e3430
Successfully built 6d8f3a1e3430
Successfully tagged vidar-vidar-api:latest
Prevention
- Always verify column names against database schema before writing queries
- Use database migrations to ensure code and schema stay in sync
- Add integration tests that exercise all query paths
Escalation
If rollback fails or issues persist:
- Check if database migration is needed
- Review recent schema changes in git history
- Contact: Platform team