Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 30 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -148,6 +148,35 @@ response = MyAgent.embed(inputs: ["Text 1", "Text 2"]).embed_now
vectors = response.data.map { |d| d[:embedding] }
```

**Normalized Usage Statistics**
```ruby
response = MyAgent.prompt("Hello").generate_now

# Works across all providers
response.usage.input_tokens
response.usage.output_tokens
response.usage.total_tokens

# Provider-specific fields when available
response.usage.cached_tokens # OpenAI, Anthropic
response.usage.reasoning_tokens # OpenAI o1 models
response.usage.service_tier # Anthropic
```

**Enhanced Instrumentation for APM Integration**
- Unified event structure: `prompt.active_agent` and `embed.active_agent` (top-level) plus `prompt.provider.active_agent` and `embed.provider.active_agent` (per-API-call)
- Event payloads include comprehensive data for monitoring tools (New Relic, DataDog, etc.):
- Request parameters: `model`, `temperature`, `max_tokens`, `top_p`, `stream`, `message_count`, `has_tools`
- Usage data: `input_tokens`, `output_tokens`, `total_tokens`, `cached_tokens`, `reasoning_tokens`, `audio_tokens`, `cache_creation_tokens` (critical for cost tracking)
- Response metadata: `finish_reason`, `response_model`, `response_id`, `embedding_count`
- Top-level events report cumulative usage across all API calls in multi-turn conversations
- Provider-level events report per-call usage for granular tracking

**Multi-Turn Usage Tracking**
- `response.usage` now returns cumulative token counts across all API calls during tool calling
- New `response.usages` array contains individual usage objects from each API call
- `Usage` objects support addition: `usage1 + usage2` for combining statistics

**Provider Enhancements**
- OpenAI Responses API: `api: :responses` or `api: :chat`
- Anthropic JSON object mode with automatic extraction
Expand Down Expand Up @@ -195,6 +224,7 @@ vectors = response.data.map { |d| d[:embedding] }
- Template rendering without blocks
- Schema generator key symbolization
- Rails 8.0 and 8.1 compatibility
- Usage extraction across OpenAI/Anthropic response formats

### Removed

Expand Down
1 change: 1 addition & 0 deletions docs/.vitepress/config.mts
Original file line number Diff line number Diff line change
Expand Up @@ -100,6 +100,7 @@ export default defineConfig({
{ text: 'Embeddings', link: '/actions/embeddings' },
{ text: 'Tools', link: '/actions/tools' },
{ text: 'Structured Output', link: '/actions/structured_output' },
{ text: 'Usage', link: '/actions/usage' },
]
},
{
Expand Down
9 changes: 9 additions & 0 deletions docs/actions.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,15 @@ Generate vectors for semantic search:

<<< @/../test/docs/actions_examples_test.rb#embeddings_vectorize{ruby:line-numbers}

### [Usage Statistics](/actions/usage)

Track token consumption and costs:

```ruby
response = agent.summarize.generate_now
response.usage.total_tokens #=> 125
```

## Common Patterns

### Multi-Capability Actions
Expand Down
71 changes: 71 additions & 0 deletions docs/actions/usage.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
---
title: Usage Statistics
description: Track token usage and performance metrics across all AI providers with normalized usage objects.
---
# {{ $frontmatter.title }}

Track token consumption and performance metrics from AI provider responses. All providers return normalized usage statistics for consistent cost tracking and monitoring.

::: tip Monitor Usage in Production
See [Instrumentation](/framework/instrumentation) to monitor usage statistics in real-time using ActiveSupport::Notifications.
:::

## Accessing Usage

Get usage statistics from any response:

<<< @/../test/docs/actions/usage_examples_test.rb#accessing_usage{ruby:line-numbers}

## Common Fields

These fields work across all providers:

<<< @/../test/docs/actions/usage_examples_test.rb#common_fields{ruby:line-numbers}

## Provider-Specific Fields

Access advanced metrics when available:

::: code-group
<<< @/../test/docs/actions/usage_examples_test.rb#provider_specific_openai{ruby:line-numbers} [OpenAI]
<<< @/../test/docs/actions/usage_examples_test.rb#provider_specific_anthropic{ruby:line-numbers} [Anthropic]
<<< @/../test/docs/actions/usage_examples_test.rb#provider_specific_ollama{ruby:line-numbers} [Ollama]
:::

## Provider Details

Raw provider data preserved in `provider_details`:

::: code-group
<<< @/../test/docs/actions/usage_examples_test.rb#provider_details_openai{ruby:line-numbers} [OpenAI]
<<< @/../test/docs/actions/usage_examples_test.rb#provider_details_ollama{ruby:line-numbers} [Ollama]
:::

## Cost Tracking

Calculate costs using token counts:

<<< @/../test/docs/actions/usage_examples_test.rb#cost_tracking{ruby:line-numbers}

**Monitor costs in production:** Use [Instrumentation](/framework/instrumentation#cost-tracking) to automatically track costs across all requests.

## Embeddings Usage

Embedding responses have zero output tokens:

<<< @/../test/docs/actions/usage_examples_test.rb#embeddings_usage{ruby:line-numbers}

## Field Mapping

How provider fields map to normalized names:

| Provider | input_tokens | output_tokens | total_tokens |
|----------|--------------|---------------|--------------|
| OpenAI Chat | prompt_tokens | completion_tokens | total_tokens |
| OpenAI Embed | prompt_tokens | 0 | total_tokens |
| OpenAI Responses | input_tokens | output_tokens | total_tokens |
| Anthropic | input_tokens | output_tokens | calculated |
| Ollama | prompt_eval_count | eval_count | calculated |
| OpenRouter | prompt_tokens | completion_tokens | total_tokens |

**Note:** `total_tokens` is automatically calculated as `input_tokens + output_tokens` when not provided by the provider.
15 changes: 9 additions & 6 deletions docs/agents/generation.md
Original file line number Diff line number Diff line change
Expand Up @@ -93,10 +93,11 @@ response.raw_request # The most recent request in provider format
response.raw_response # The most recent response in provider format
response.context # The original context that was sent

# Usage statistics (when available from provider)
response.prompt_tokens # Input tokens used
response.completion_tokens # Output tokens used
response.total_tokens # Total tokens used
# Usage statistics (see /actions/usage for details)
response.usage # Normalized usage object across all providers
response.usage.input_tokens
response.usage.output_tokens
response.usage.total_tokens
```

For embeddings:
Expand All @@ -110,14 +111,16 @@ response.raw_request # The most recent request in provider format
response.raw_response # The most recent response in provider format
response.context # The original context that was sent

# Usage statistics (when available from provider)
response.prompt_tokens
# Usage statistics
response.usage # Normalized usage object
response.usage.input_tokens
```

## Next Steps

- [Agents](/agents) - Understanding the full agent lifecycle
- [Actions](/actions) - Define what your agents can do
- [Usage Statistics](/actions/usage) - Track token consumption and costs
- [Messages](/actions/messages) - Work with multimodal content
- [Tools](/actions/tools) - Enable function calling capabilities
- [Streaming](/agents/streaming) - Stream responses in real-time
Expand Down
2 changes: 1 addition & 1 deletion docs/framework.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@ When you define an agent, you create a specialized participant that interacts wi

- **Agent** (Controller) - Manages lifecycle, defines actions, configures providers
- **Generation** (Request Proxy) - Coordinates execution, holds configuration, provides synchronous/async methods. Created by invocation, it's lazy—execution doesn't start until you call `.prompt_now`, `.embed_now`, or `.prompt_later`.
- **Response** (Result) - Contains messages, metadata, token usage, and parsed output. Returned after Generation executes.
- **Response** (Result) - Contains messages, metadata, and normalized usage statistics (see **[Usage Statistics](/actions/usage)**). Returned after Generation executes.

**Request-Response Lifecycle:**

Expand Down
Loading