Skip to main content

Service Inventory & Log Taxonomy

[[TOC]]

Overview

Total Services: 26 Categories: 7 Current Log State: Mostly unstructured text Target Log State: JSON-formatted with standard fields


Infrastructure (5 services)

ServicePurposeLog TypesKey Metrics
TraefikEdge proxy, TLS terminationHTTP access logs, routing decisions, backend healthRequest rate, latency p50/p95/p99, error rate
PostgreSQLPrimary databaseQuery logs, connections, replication, autovacuumActive connections, query duration, deadlocks, replication lag
RedisCache & sessionsCommand logs, evictions, persistenceOps/sec, memory usage, eviction rate, hit ratio
RavenmaskOS DBCustom OS databaseSchema operations, queriesQuery count, schema version
RedpandaEvent streamingBroker logs, partition rebalance, consumer lagThroughput MB/s, consumer lag, partition count

Identity/Security (5 services)

ServicePurposeLog TypesKey Metrics
ZitadelIdentity providerAuth events, token issuance, user operationsAuth success/fail rate, token issuance rate, active sessions
SPIRE ServerWorkload identitySVID issuance, attestationWorkload count, SVID TTL distribution, attestation failures
SPIRE AgentWorkload attestationLocal attestation, SVID deliveryAgent health, workload registrations
OpenFGAAuthorizationRelation checks, tuple writesCheck latency, tuple write rate, cache hit ratio
OAuth2 ProxySSO middlewareSession creation, upstream authSession count, auth failures, token refresh rate

AI/Automation (5 services)

ServicePurposeLog TypesKey Metrics
n8nWorkflow automationExecution logs, node failures, webhooksWorkflow success rate, execution duration, active workflows
OllamaLocal LLM inferenceModel loads, inference requests, errorsInference latency, queue depth, model memory usage
LangGraphAgent frameworkExecution logs, state transitions, tool callsExecution count, avg duration, error rate, tool usage
LangfuseLLM observabilityTrace spans, token usage, costsToken count, cost per request, trace duration
NornsMulti-agent orchestrationAgent coordination, task distributionTask success rate, agent invocations, response latency

Voice (3 services)

ServicePurposeLog TypesKey Metrics
LiveKitWebRTC infrastructureRoom events, track publications, TURN usageActive rooms, participant count, bitrate, packet loss
Voice GatewayWeb voice via LiveKit + OpenAI RealtimeSession events, Norns delegationsActive sessions, response latency, error rate
TelephonyPhone voice via Twilio + PipecatCall events, Twilio webhooks, STT/TTS processingCall count, caller identification rate, call duration

DevOps (3 services)

ServicePurposeLog TypesKey Metrics
GitLabSource control, CI/CDPipeline runs, merge events, webhooksPipeline success rate, job duration, runner utilization
GrafanaObservability UIDashboard loads, alert evaluations, queriesActive users, dashboard load time, alert eval latency
MCP GitLabGitLab MCP serverTool invocations, API requestsRequest rate, error rate, latency

Monitoring (7 services)

ServicePurposeLog TypesKey Metrics
PrometheusMetrics collectionScrape health, rule evaluationScrape duration, cardinality, rule eval time
LokiLog aggregationIngestion, queries, compactionIngestion rate, query latency, storage size
TempoDistributed tracingTrace ingestion, samplingTrace ingestion rate, sampling ratio, query latency
AlloyTelemetry pipelineReceiver/processor/exporter logsPipeline lag, error rate, throughput
Uptime KumaUptime monitoringProbe results, notificationsUptime %, probe latency, notification delivery
cAdvisorContainer metricsResource usage per containerCPU/memory per container
Node ExporterHost metricsSystem-level metricsCPU, memory, disk, network

Home Automation (3 services)

ServicePurposeLog TypesKey Metrics
Home AssistantHome automation hubEntity state changes, automation triggersActive automations, entity count, event rate
HomebridgeHomeKit bridgeAccessory updates, HomeKit eventsAccessory count, event rate
GrocyHousehold inventoryInventory changes, shopping listItem count, expiring items

Log Structure Requirements

Standard Fields (all services)

{
"timestamp": "2026-01-02T23:38:53.524Z",
"level": "info|warn|error|debug",
"service": "traefik",
"trace_id": "e81cfe01-66cf-4334-83b6-b5b7b27643d7",
"message": "Backend health check passed",
"context": {
// Service-specific fields
}
}

Service-Specific Context Examples

Traefik:

"context": {
"method": "GET",
"path": "/api/widgets/resources",
"status": 304,
"duration_ms": 32,
"backend": "dashboard",
"client_ip": "172.20.0.1"
}

PostgreSQL:

"context": {
"query": "SELECT * FROM users WHERE id = $1",
"duration_ms": 12.4,
"rows": 1,
"connection_id": "5f3a2b1c"
}

LangGraph:

"context": {
"agent_id": "sre-agent-v2.1.4",
"execution_id": "exec_abc123",
"state": "investigating",
"tool_called": "query_loki",
"duration_ms": 847
}

Telephony:

"context": {
"call_sid": "CA1234567890abcdef",
"caller_phone": "+15127812507",
"user_id": "701973d2-57e4-4c84-a2ec-ded996dcf676",
"display_name": "Nathan Walker",
"event": "call_started"
}

Next: [[AIOps-Observability-Stack]] - Alloy, Loki, Prometheus configuration