Skip to main content

Human-in-the-Loop Approval

Approval workflow for high-risk automated operations in RavenmaskOS.

Overview

High-risk operations (destructive infrastructure changes, data deletion, etc.) require human approval before execution when triggered by automated systems. This ensures safety while maintaining automation benefits.

When HITL is Required

Trigger TypeHITL Required
Human CLI (Claude Code)No - Trusted operator
SRE Agent (Vidar)Yes - Automated
Runbook AutomationYes - Automated
n8n WorkflowYes - Automated
Norns AgentYes - Automated

Risk Levels

Tools are classified by risk level in the tool_definitions table:

Risk LevelCriteriaApproval
CriticalData loss, production impact2 approvers, 15 min SLA
HighService disruption, config changes1 approver, 30 min SLA
MediumNon-critical changes1 approver, 4 hour SLA
LowRead-only, reversibleAuto-approve with audit

High-Risk Infrastructure Tools

These tools have is_destructive: true and requires_confirmation: true:

  • infra_docker_remove_container - Remove Docker container
  • infra_docker_remove_volume - Remove Docker volume (data loss!)
  • infra_remove_data_directory - Remove service data directory
  • infra_zitadel_delete_app - Delete OIDC application

Approval Flow

┌─────────────────┐
│ Agent/Automation │
│ requests tool │
└────────┬────────┘


┌─────────────────┐ ┌──────────────┐
│ Check requires_ │────▶│ Execute │ (if false)
│ confirmation │ │ immediately │
└────────┬────────┘ └──────────────┘
│ (if true)

┌─────────────────┐
│ Check caller │
│ type │
└────────┬────────┘

┌────┴────┐
│ │
▼ ▼
┌────────┐ ┌────────────┐
│ Human │ │ Automated │
│ CLI │ │ (agent) │
└────┬───┘ └─────┬──────┘
│ │
▼ ▼
┌────────┐ ┌────────────┐
│Execute │ │Queue for │
│directly│ │approval │
└────────┘ └─────┬──────┘


┌────────────┐
│Notify via │
│Slack │
└─────┬──────┘

┌─────┴─────┐
│ │
▼ ▼
┌──────────┐ ┌──────────┐
│ Approved │ │ Rejected │
└────┬─────┘ └────┬─────┘
│ │
▼ ▼
┌──────────┐ ┌──────────┐
│ Execute │ │ Log and │
│ tool │ │ notify │
└──────────┘ └──────────┘

Notification Channels

  1. Slack #ops-approvals - Primary channel for approval requests
  2. On-call DM - Escalation if no response within SLA/2
  3. Email - Backup notification (future)

Approval Request Format

🔒 APPROVAL REQUIRED: High-Risk Operation

Tool: infra_docker_remove_volume
Risk: HIGH
Requested by: sre-agent (Vidar)
Incident: INC-1234

Arguments:
volume_name: twenty_data
force: false

Affected Entity:
Type: Docker Volume
Name: twenty_data
Service: Twenty CRM (decommissioned)

[✅ Approve] [❌ Reject] [📋 Details]

API Endpoints

EndpointMethodDescription
/api/v1/approvalsPOSTCreate approval request
/api/v1/approvalsGETList pending approvals
/api/v1/approvals/{id}GETGet approval details
/api/v1/approvals/{id}/approvePOSTApprove request
/api/v1/approvals/{id}/rejectPOSTReject request

Database Schema

CREATE TABLE vidar_approval_requests (
approval_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
operation_type VARCHAR(100) NOT NULL,
tool_name VARCHAR(100),
arguments JSONB,
risk_level VARCHAR(20) NOT NULL,
status VARCHAR(20) DEFAULT 'pending',
requested_by UUID,
requested_at TIMESTAMPTZ DEFAULT NOW(),
approvers_required INT DEFAULT 1,
approved_by UUID[],
approved_at TIMESTAMPTZ,
rejection_reason TEXT,
expires_at TIMESTAMPTZ,
execution_result JSONB,
incident_id UUID REFERENCES aiops_incidents(incident_id),
CONSTRAINT valid_status CHECK (status IN ('pending', 'approved', 'rejected', 'expired', 'executed'))
);

CREATE INDEX idx_approval_status ON vidar_approval_requests(status);
CREATE INDEX idx_approval_expires ON vidar_approval_requests(expires_at) WHERE status = 'pending';

Integration Points

Bifrost

  • InfrastructureExecutor checks requires_confirmation and caller_type
  • Returns pending_approval status with approval ID if HITL required
  • Polls or receives webhook when approval granted

Vidar

  • Manages approval queue and lifecycle
  • Sends notifications via n8n webhook
  • Executes tool upon approval

n8n

  • Workflow: "HITL Approval Notification"
  • Sends Slack interactive message
  • Handles button callbacks for approve/reject

Bypassing HITL

HITL can be bypassed in these scenarios:

  1. Human operator - Direct CLI invocation is trusted
  2. Emergency override - On-call can bypass with audit trail
  3. Pre-approved change - Linked to approved change ticket

To bypass (emergency only):

{"tool": "infra_remove_data_directory", "arguments": {...}, "bypass_hitl": true, "reason": "Emergency: ..."}

Audit Trail

All approval requests are logged with:

  • Request timestamp and requester
  • Approval/rejection decision and rationale
  • Approver identity and timestamp
  • Execution result
  • Linked incident/change ticket

View in Vidar Admin: https://vidar.ravenhelm.dev/approvals