Skip to content

🤖 Production-ready samples for building multi-modal AI agents that understand images, documents, videos, and text using Amazon Bedrock and Strands Agents. Features Claude integration, MCP tools, streaming responses, and enterprise-grade architecture.

License

Notifications You must be signed in to change notification settings

elizabethfuentes12/strands-agent-samples

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Strands Agent Samples

A comprehensive repository for implementing and demonstrating multi-modal understanding capabilities using the Strands Agent framework. This project enables processing and analysis of various types of content including documents, images, and videos with advanced features like persistent memory, observability, and inter-agent communication.

Overview

This repository contains tools and examples for building AI agents capable of understanding and processing multiple types of media:

  • Images (PNG, JPEG/JPG, GIF, WebP)
  • Documents (PDF, CSV, DOCX, XLS, XLSX)
  • Videos (MP4, MOV, AVI, MKV, WebM)

📚 Notebooks and Examples

Use Case Overview Features Language
🎯 Multi-Agent Multimodal Analysis Basic demonstration notebook that shows how to process and analyze different types of media (images, documents, videos) and analyzing content and generating human-readable responses using Strands Agent Multi-modal processing, AWS Bedrock integration, Custom tools Python
🎯 Multi-Agent Multimodal Analysis with FAISS Memory Advanced notebook showcasing multi-modal (images, documents, videos) analysis with FAISS memory capabilities for storing and retrieving information across sessions FAISS memory, Persistent storage, Cross-session continuity, User-specific memory Python
☁️ Multi-Agent Multimodal Analysis with S3 Vectors Memory Production-ready notebook demonstrating multi-modal content processing with Amazon S3 Vectors as the memory backend, providing AWS-native scalable memory storage S3 Vectors memory, AWS-native storage, Auto-scaling, Enterprise-grade memory, AWS integration Python
🔍 Observability with LangFuse and Evaluation with RAGAS Comprehensive notebook demonstrating how to implement observability and evaluation for Strands agents using LangFuse for tracing and RAGAS for evaluation metrics with a restaurant recommendation use case LangFuse tracing, RAGAS evaluation, Performance monitoring, Quality assessment Python
🔧 Model Context Protocol (MCP) Tools Tutorial notebook showing how to create and integrate MCP servers with Strands agents, including custom calculator tools and weather services MCP server creation, Custom tools, Protocol integration, Calculator and weather examples Python
🤝 Agent-to-Agent (A2A) Protocol Advanced notebook demonstrating inter-agent communication using the A2A protocol, showcasing how multiple agents can collaborate and share information Inter-agent communication, A2A protocol, Collaborative workflows, Multi-agent systems Python
📹 S3 Video Memory Demo Specialized notebook for processing videos stored in S3 with memory capabilities, combining cloud storage with intelligent video analysis S3 integration, Cloud video processing, Memory storage, Scalable pipelines Python

🛠️ Supporting Tools and Files

File Description Purpose
Video Reader Custom Tool A custom tool for processing video content. It extracts frames from videos at specified intervals, converts them to base64-encoded images, and provides them to the agent for analysis Video frame extraction and analysis
S3 Memory Tool AWS-native memory management tool using Amazon S3 Vectors as backend, providing scalable and persistent memory for Strands agents S3 Vectors integration and memory management
MCP Calculator Example MCP server implementation for calculator functionality MCP server example
Requirements Required Python packages for running all notebooks Dependency management

🚀 Key Features

🎯 Multi-Modal Processing

  • Image Analysis: Process and understand visual content
  • Document Processing: Extract and summarize text from various formats
  • Video Analysis: Frame extraction and temporal understanding
  • Cross-Modal Correlation: Connect insights across different media types

🧠 Memory & Persistence

  • FAISS-Powered Search: Efficient similarity search for relevant information
  • S3 Vectors Memory: AWS-native scalable memory with Amazon S3 Vectors
  • Cross-Session Memory: Information persists between application restarts
  • User-Specific Storage: Personalized memory with unique user IDs
  • Contextual Retrieval: Smart retrieval based on query context
  • Enterprise-Grade Storage: Production-ready memory with automatic scaling

🔍 Observability & Evaluation

  • LangFuse Integration: Comprehensive tracing and monitoring
  • RAGAS Metrics: Automated evaluation of agent performance
  • Performance Monitoring: Real-time insights into agent behavior
  • Quality Assessment: Continuous improvement through evaluation

🔧 Protocol Integration

  • Model Context Protocol (MCP): Standardized tool integration
  • Agent-to-Agent (A2A): Inter-agent communication and collaboration
  • Custom Tool Development: Build specialized tools for specific needs
  • Serverless Deployment: AWS-native implementations

☁️ Cloud Integration

  • AWS Bedrock: Access to state-of-the-art foundation models
  • S3 Storage: Scalable storage for media files
  • Lambda Functions: Serverless agent deployment
  • CDK Infrastructure: Infrastructure as code

📁 Repository Structure

strands-agent-multi-understanding/
├── notebook/                           # Jupyter notebooks and examples
│   ├── multi-understanding.ipynb      # Basic multi-modal processing
│   ├── multi-understanding-with-memory.ipynb  # Advanced with FAISS memory
│   ├── multi-understanding-with-s3-memory.ipynb # Production-ready with S3 Vectors memory
│   ├── Strands_Observability_with_LangFuse_and_Evaluation_with_RAGAS.ipynb
│   ├── Strands_MCP_AND_Tools.ipynb    # MCP integration examples
│   ├── Strands_A2A_Tools.ipynb        # Agent-to-Agent communication
│   ├── s3_video_memory_demo.ipynb     # S3 video processing
│   ├── video_reader.py                # Custom video processing tool
│   ├── s3_video_memory.py             # S3 video memory tool
│   ├── mcp_calulator.py               # MCP calculator server
│   ├── requirements.txt               # Python dependencies
│   └── data-sample/                   # Sample files for testing
└── my_agent_cdk/                      # AWS CDK application
    ├── lambdas/code/lambda-s-agent    # Weather forecasting Lambda
    └── lambdas/code/lambda-s-multimodal # Multi-modal processing Lambda

🏁 Getting Started

Prerequisites

  • Python 3.8+
  • AWS account with Bedrock access
  • AWS CLI configured
  • Node.js (for CDK deployment)

Quick Start

  1. Clone the repository:

    git clone [strands-agent-samples](https://github.com/elizabethfuentes12/strands-agent-samples)
  2. Set up the notebook environment:

    cd notebook
    python -m venv .venv
    source .venv/bin/activate  # On Windows: .venv\Scripts\activate
    pip install -r requirements.txt
  3. Configure AWS credentials for Bedrock access:

    aws configure
  4. Start exploring:

    jupyter notebook

Recommended Learning Path

  1. Start with basics: multi-understanding.ipynb
  2. Add memory: multi-understanding-with-memory.ipynb
  3. Production memory: multi-understanding-with-s3-memory.ipynb
  4. Learn observability: Strands_Observability_with_LangFuse_and_Evaluation_with_RAGAS.ipynb
  5. Explore protocols: Strands_MCP_AND_Tools.ipynb
  6. Advanced collaboration: Strands_A2A_Tools.ipynb

🏗️ CDK Application

The /my_agent_cdk/ directory contains an AWS CDK application for deploying serverless Lambda functions:

Available Lambda Functions

Function Description Use Case
Weather Forecasting Agent Lambda function using Strands Agent for weather forecasting API-based weather services
Multi-modal Processing Agent Lambda function for processing images, documents, and videos Serverless content analysis

Deployment Instructions

  1. Navigate to CDK directory:

    cd my_agent_cdk
  2. Install dependencies:

    python -m venv .venv
    source .venv/bin/activate
    pip install -r requirements.txt
  3. Package Lambda layers:

    pip install -r layers/lambda_requirements.txt --python-version 3.12 --platform manylinux2014_aarch64 --target layers/strands/_dependencies --only-binary=:all:
    python layers/package_for_lambda.py
  4. Deploy:

    cdk bootstrap  # First time only
    cdk deploy

For detailed instructions, see the CDK application README.

💡 Use Cases

  • Content Analysis: Automated processing of mixed media content
  • Knowledge Management: Building searchable knowledge bases from various media types
  • Educational Tools: Multi-modal learning assistants with memory
  • Business Intelligence: Extracting insights from documents, images, and videos
  • Quality Assurance: Automated evaluation and monitoring of AI agents
  • Collaborative AI: Multi-agent systems for complex workflows
  • Customer Support: Intelligent assistants with observability and evaluation
  • Research & Development: Advanced AI experimentation with proper tooling

📖 Resources

🤝 Contributing

We welcome contributions! Please see CONTRIBUTING.md for guidelines.

🔒 Security

See CONTRIBUTING for security issue notifications.

📄 License

This library is licensed under the MIT-0 License. See the LICENSE file.


🇻🇪🇨🇱 ¡Gracias!

Eli | Dev.to | GitHub | Twitter | YouTube


Ready to build intelligent multi-modal AI agents? Start with the notebooks and explore the endless possibilities!

About

🤖 Production-ready samples for building multi-modal AI agents that understand images, documents, videos, and text using Amazon Bedrock and Strands Agents. Features Claude integration, MCP tools, streaming responses, and enterprise-grade architecture.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published