activeagents · sirwolfgang · Nov 13, 2025 · Nov 13, 2025 · Nov 20, 2025
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -148,6 +148,35 @@ response = MyAgent.embed(inputs: ["Text 1", "Text 2"]).embed_now
 vectors = response.data.map { |d| d[:embedding] }
 ```
 
+**Normalized Usage Statistics**
+```ruby
+response = MyAgent.prompt("Hello").generate_now
+
+# Works across all providers
+response.usage.input_tokens
+response.usage.output_tokens
+response.usage.total_tokens
+
+# Provider-specific fields when available
+response.usage.cached_tokens      # OpenAI, Anthropic
+response.usage.reasoning_tokens   # OpenAI o1 models
+response.usage.service_tier       # Anthropic
+```
+
+**Enhanced Instrumentation for APM Integration**
+- Unified event structure: `prompt.active_agent` and `embed.active_agent` (top-level) plus `prompt.provider.active_agent` and `embed.provider.active_agent` (per-API-call)
+- Event payloads include comprehensive data for monitoring tools (New Relic, DataDog, etc.):
+  - Request parameters: `model`, `temperature`, `max_tokens`, `top_p`, `stream`, `message_count`, `has_tools`
+  - Usage data: `input_tokens`, `output_tokens`, `total_tokens`, `cached_tokens`, `reasoning_tokens`, `audio_tokens`, `cache_creation_tokens` (critical for cost tracking)
+  - Response metadata: `finish_reason`, `response_model`, `response_id`, `embedding_count`
+- Top-level events report cumulative usage across all API calls in multi-turn conversations
+- Provider-level events report per-call usage for granular tracking
+
+**Multi-Turn Usage Tracking**
+- `response.usage` now returns cumulative token counts across all API calls during tool calling
+- New `response.usages` array contains individual usage objects from each API call
+- `Usage` objects support addition: `usage1 + usage2` for combining statistics
+
 **Provider Enhancements**
 - OpenAI Responses API: `api: :responses` or `api: :chat`
 - Anthropic JSON object mode with automatic extraction
@@ -195,6 +224,7 @@ vectors = response.data.map { |d| d[:embedding] }
 - Template rendering without blocks
 - Schema generator key symbolization
 - Rails 8.0 and 8.1 compatibility
+- Usage extraction across OpenAI/Anthropic response formats
 
 ### Removed
 

diff --git a/docs/.vitepress/config.mts b/docs/.vitepress/config.mts
@@ -100,6 +100,7 @@ export default defineConfig({
           { text: 'Embeddings', link: '/actions/embeddings' },
           { text: 'Tools', link: '/actions/tools' },
           { text: 'Structured Output', link: '/actions/structured_output' },
+          { text: 'Usage', link: '/actions/usage' },
         ]
       },
       {

diff --git a/docs/actions.md b/docs/actions.md
@@ -43,6 +43,15 @@ Generate vectors for semantic search:
 
 <<< @/../test/docs/actions_examples_test.rb#embeddings_vectorize{ruby:line-numbers}
 
+### [Usage Statistics](/actions/usage)
+
+Track token consumption and costs:
+
+```ruby
+response = agent.summarize.generate_now
+response.usage.total_tokens  #=> 125
+```
+
 ## Common Patterns
 
 ### Multi-Capability Actions

diff --git a/docs/actions/usage.md b/docs/actions/usage.md
@@ -0,0 +1,71 @@
+---
+title: Usage Statistics
+description: Track token usage and performance metrics across all AI providers with normalized usage objects.
+---
+# {{ $frontmatter.title }}
+
+Track token consumption and performance metrics from AI provider responses. All providers return normalized usage statistics for consistent cost tracking and monitoring.
+
+::: tip Monitor Usage in Production
+See [Instrumentation](/framework/instrumentation) to monitor usage statistics in real-time using ActiveSupport::Notifications.
+:::
+
+## Accessing Usage
+
+Get usage statistics from any response:
+
+<<< @/../test/docs/actions/usage_examples_test.rb#accessing_usage{ruby:line-numbers}
+
+## Common Fields
+
+These fields work across all providers:
+
+<<< @/../test/docs/actions/usage_examples_test.rb#common_fields{ruby:line-numbers}
+
+## Provider-Specific Fields
+
+Access advanced metrics when available:
+
+::: code-group
+<<< @/../test/docs/actions/usage_examples_test.rb#provider_specific_openai{ruby:line-numbers} [OpenAI]
+<<< @/../test/docs/actions/usage_examples_test.rb#provider_specific_anthropic{ruby:line-numbers} [Anthropic]
+<<< @/../test/docs/actions/usage_examples_test.rb#provider_specific_ollama{ruby:line-numbers} [Ollama]
+:::
+
+## Provider Details
+
+Raw provider data preserved in `provider_details`:
+
+::: code-group
+<<< @/../test/docs/actions/usage_examples_test.rb#provider_details_openai{ruby:line-numbers} [OpenAI]
+<<< @/../test/docs/actions/usage_examples_test.rb#provider_details_ollama{ruby:line-numbers} [Ollama]
+:::
+
+## Cost Tracking
+
+Calculate costs using token counts:
+
+<<< @/../test/docs/actions/usage_examples_test.rb#cost_tracking{ruby:line-numbers}
+
+**Monitor costs in production:** Use [Instrumentation](/framework/instrumentation#cost-tracking) to automatically track costs across all requests.
+
+## Embeddings Usage
+
+Embedding responses have zero output tokens:
+
+<<< @/../test/docs/actions/usage_examples_test.rb#embeddings_usage{ruby:line-numbers}
+
+## Field Mapping
+
+How provider fields map to normalized names:
+
+| Provider | input_tokens | output_tokens | total_tokens |
+|----------|--------------|---------------|--------------|
+| OpenAI Chat | prompt_tokens | completion_tokens | total_tokens |
+| OpenAI Embed | prompt_tokens | 0 | total_tokens |
+| OpenAI Responses | input_tokens | output_tokens | total_tokens |
+| Anthropic | input_tokens | output_tokens | calculated |
+| Ollama | prompt_eval_count | eval_count | calculated |
+| OpenRouter | prompt_tokens | completion_tokens | total_tokens |
+
+**Note:** `total_tokens` is automatically calculated as `input_tokens + output_tokens` when not provided by the provider.
diff --git a/docs/agents/generation.md b/docs/agents/generation.md
@@ -93,10 +93,11 @@ response.raw_request       # The most recent request in provider format
 response.raw_response      # The most recent response in provider format
 response.context           # The original context that was sent
 
-# Usage statistics (when available from provider)
-response.prompt_tokens     # Input tokens used
-response.completion_tokens # Output tokens used
-response.total_tokens      # Total tokens used
+# Usage statistics (see /actions/usage for details)
+response.usage             # Normalized usage object across all providers
+response.usage.input_tokens
+response.usage.output_tokens
+response.usage.total_tokens
 ```
 
 For embeddings:
@@ -110,14 +111,16 @@ response.raw_request  # The most recent request in provider format
 response.raw_response # The most recent response in provider format
 response.context      # The original context that was sent
 
-# Usage statistics (when available from provider)
-response.prompt_tokens
+# Usage statistics
+response.usage             # Normalized usage object
+response.usage.input_tokens
 ```
 
 ## Next Steps
 
 - [Agents](/agents) - Understanding the full agent lifecycle
 - [Actions](/actions) - Define what your agents can do
+- [Usage Statistics](/actions/usage) - Track token consumption and costs
 - [Messages](/actions/messages) - Work with multimodal content
 - [Tools](/actions/tools) - Enable function calling capabilities
 - [Streaming](/agents/streaming) - Stream responses in real-time

diff --git a/docs/framework.md b/docs/framework.md
@@ -60,7 +60,7 @@ When you define an agent, you create a specialized participant that interacts wi
 
 - **Agent** (Controller) - Manages lifecycle, defines actions, configures providers
 - **Generation** (Request Proxy) - Coordinates execution, holds configuration, provides synchronous/async methods. Created by invocation, it's lazy—execution doesn't start until you call `.prompt_now`, `.embed_now`, or `.prompt_later`.
-- **Response** (Result) - Contains messages, metadata, token usage, and parsed output. Returned after Generation executes.
+- **Response** (Result) - Contains messages, metadata, and normalized usage statistics (see **[Usage Statistics](/actions/usage)**). Returned after Generation executes.
 
 **Request-Response Lifecycle:**