View LLM Traces

Debug AI interactions using Langfuse tracing.

Overview

Langfuse captures every LLM interaction including:

Prompts and completions
Token usage and costs
Latency metrics
Tool calls and responses
Conversation context

Access at: https://langfuse.ravenhelm.dev

When to Use

AI gave an unexpected response
Tool execution failed
Performance seems slow
Debugging prompt engineering
Analyzing token costs

Finding a Trace

By Time

Go to langfuse.ravenhelm.dev
Click Traces in sidebar
Filter by time range (last hour, today, etc.)
Click on a trace to expand

By User

Go to Traces
Filter by User ID
Find your user email or ID

By Session

Each conversation has a session ID:

Filter by Session ID
See all messages in that conversation

Understanding a Trace

Trace View

Trace: "What is on my todo list?"
├── Generation: GPT-4 (1.2s, 450 tokens)
│   ├── Input: System prompt + user message
│   ├── Output: Tool call - query_tasks
│   └── Metadata: temperature=0.7, model=gpt-4
│
├── Span: Tool Execution (0.8s)
│   ├── Tool: query_tasks
│   ├── Arguments: {"status": "open", "limit": 10}
│   └── Result: [3 tasks returned]
│
└── Generation: GPT-4 (0.6s, 200 tokens)
    ├── Input: Tool result + conversation
    └── Output: "You have 3 tasks..."

Key Metrics

Metric	What It Means
Latency	Total time for the interaction
Tokens	Input + output token count
Cost	Estimated API cost
Model	Which LLM was used

Common Debugging Scenarios

Wrong Tool Selected

Symptom: Norns called the wrong tool

Debug Steps:

Find the trace in Langfuse
Look at the Generation step
Check the system prompt - does it describe tools correctly?
Check the user input - was it ambiguous?
Look at tool descriptions in Bifrost

Example Finding:

User: "Add eggs"
Tool called: update_task (wrong)
Expected: add_shopping_item

Issue: "Add" is ambiguous - could be task or shopping
Solution: Improve tool descriptions or add disambiguation

Tool Execution Failed

Symptom: Tool returned an error

Debug Steps:

Find the trace
Expand the Span for tool execution
Check the error message
Look at input arguments - were they valid?

Example Finding:

Tool: create_task
Arguments: {"title": null}  <-- Problem!
Error: "title is required"

Issue: LLM sent null for required field
Solution: Improve prompt to emphasize required fields

Slow Response

Symptom: Took too long to respond

Debug Steps:

Find the trace
Look at latency breakdown:
- Generation time (LLM)
- Tool execution time
- Network overhead

Example Finding:

Total: 8.5s
├── Generation 1: 1.2s (normal)
├── Tool (query_tasks): 6.0s  <-- Problem!
└── Generation 2: 0.8s (normal)

Issue: Database query too slow
Solution: Add index or optimize query

Context Too Long

Symptom: Conversation degraded or errors

Debug Steps:

Check token count in trace
Look at input to later generations
Check if context window exceeded

Example Finding:

Generation 15:
Input tokens: 12,000
Model limit: 8,000

Issue: Exceeded context window
Solution: Implement conversation summarization

Useful Filters

In Langfuse UI

Filter	Use Case
Score < 0.5	Find low-quality responses
Latency > 5s	Find slow interactions
Status = Error	Find failures
Model = gpt-4	Filter by model
Tags = production	Production traces only

Via API

# Get recent traces
curl https://langfuse.ravenhelm.dev/api/public/traces \
  -H "Authorization: Bearer $LANGFUSE_API_KEY" \
  -d {limit: 10}

Setting Up Tracing

Tracing is automatic for Norns. For custom integrations:

from langfuse import Langfuse

langfuse = Langfuse(
    public_key="pk-...",
    secret_key="sk-...",
    host="https://langfuse.ravenhelm.dev"
)

# Create a trace
trace = langfuse.trace(
    name="my-interaction",
    user_id="user@example.com"
)

# Log a generation
trace.generation(
    name="llm-call",
    model="gpt-4",
    input=messages,
    output=response
)

Overview​

When to Use​

Finding a Trace​

By Time​

By User​

By Session​

Understanding a Trace​

Trace View​

Key Metrics​

Common Debugging Scenarios​

Wrong Tool Selected​

Tool Execution Failed​

Slow Response​

Context Too Long​

Useful Filters​

In Langfuse UI​

Via API​

Setting Up Tracing​

See Also​

Overview

When to Use

Finding a Trace

By Time

By User

By Session

Understanding a Trace

Trace View

Key Metrics

Common Debugging Scenarios

Wrong Tool Selected

Tool Execution Failed

Slow Response

Context Too Long

Useful Filters

In Langfuse UI

Via API

Setting Up Tracing

See Also