Skip to content

kerim/zotero-code-execution

Repository files navigation

Zotero Code Execution

Efficient multi-strategy Zotero search using code execution pattern

License: MIT

A Python library for Zotero MCP that implements Anthropic's code execution pattern to enable safe, comprehensive searches without context overflow or crashes.

Quick Start

import sys
sys.path.append('/path/to/zotero-code-execution')
import setup_paths
from zotero_lib import SearchOrchestrator, format_results

# Single comprehensive search - fetches 100+ items, returns top 20
orchestrator = SearchOrchestrator()
results = orchestrator.comprehensive_search("embodied cognition", max_results=20)
print(format_results(results))

That's it! This automatically:

  • ✅ Performs semantic + keyword + tag searches
  • ✅ Deduplicates results
  • ✅ Ranks by relevance
  • ✅ Keeps large datasets in code (no crashes)

Multi-Term Searches

For OR-style searches (e.g., multiple spellings or languages), search each term separately and merge:

# Search for "Atayal" OR "泰雅族"
all_results = {}

for term in ['Atayal', '泰雅族']:
    results = orchestrator.comprehensive_search(term, max_results=50)
    for item in results:
        all_results[item.key] = item  # Deduplicate by key

# Re-rank combined results
ranked = orchestrator._rank_items(list(all_results.values()), 'Atayal 泰雅族')
print(format_results(ranked[:25]))

Why? Zotero treats multi-word queries as AND conditions. Searching "Atayal 泰雅族" finds items matching BOTH terms, not either term.

Why This Exists

The Problem

Direct MCP tool calls have limitations:

  • 🚫 Crash risk with large result sets (>15-20 items)
  • 🚫 Token bloat - all results load into LLM context
  • 🚫 Manual orchestration - multiple searches, manual deduplication
  • 🚫 No ranking - results not sorted by relevance

The Solution

Code execution keeps large datasets in the execution environment:

  • No crashes - only filtered results return to context
  • Token efficient - process 100+ items, return top 20
  • Auto-orchestration - multi-strategy search in one call
  • Auto-ranking - results sorted by relevance

Features

Multi-Strategy Search

One function call performs:

  • Semantic search (multiple variations)
  • Keyword search (multiple modes)
  • Tag-based search
  • Automatic deduplication
  • Relevance ranking

Safe Large Searches

# ❌ Old way: Crash risk
results1 = zotero_semantic_search("query", limit=10)  # Limited to 10
results2 = zotero_search_items("query", limit=10)     # Another 10
# Manual deduplication, manual ranking...

# ✅ New way: Safe and comprehensive
orchestrator = SearchOrchestrator()
results = orchestrator.comprehensive_search("query", max_results=20)
# Fetches 100+, processes in code, returns top 20

Advanced Filtering

# Fetch broadly, filter in code
library = ZoteroLibrary()
items = library.search_items("machine learning", limit=100)  # Safe!

# Filter to recent journal articles
filtered = orchestrator.filter_by_criteria(
    items,
    item_types=["journalArticle"],
    date_range=(2020, 2025)
)

Installation

Requirements

  • Python 3.8+
  • Zotero MCP installed via pipx
  • Claude Code or similar code execution environment

Setup

  1. Clone this repository:
git clone https://github.com/yourusername/zotero-code-execution.git
cd zotero-code-execution
  1. Install dependencies (optional - usually already installed with Zotero MCP):
pip install -r requirements.txt
  1. Use in your code:
import sys
sys.path.append('/path/to/zotero-code-execution')
import setup_paths  # Adds zotero_mcp to path
from zotero_lib import SearchOrchestrator, format_results

Usage Examples

Basic Search

orchestrator = SearchOrchestrator()
results = orchestrator.comprehensive_search("neural networks", max_results=20)
print(format_results(results))

Filter by Author

library = ZoteroLibrary()
results = library.search_items("Kahneman", qmode="titleCreatorYear", limit=50)
sorted_results = sorted(results, key=lambda x: x.date, reverse=True)
print(format_results(sorted_results))

Tag-Based Search

library = ZoteroLibrary()
results = library.search_by_tag(["learning", "cognition"], limit=50)
print(format_results(results[:20]))

Recent Papers

library = ZoteroLibrary()
results = library.get_recent(limit=20)
print(format_results(results))

Custom Filtering

library = ZoteroLibrary()
orchestrator = SearchOrchestrator(library)

items = library.search_items("AI", limit=100)

# Only recent papers with DOI
recent_with_doi = [
    item for item in items
    if item.doi and item.date and int(item.date[:4]) >= 2020
]
print(format_results(recent_with_doi))

See examples.py for 8 complete working examples.

Claude Code Skill

This repository includes a Claude Code skill for easy integration.

Installation

Copy the skill to your Claude skills directory:

cp -r claude-skill ~/.claude/skills/zotero-mcp-code

Usage

In Claude Code, searches will automatically use the code execution pattern:

"Find papers about embodied cognition"

Claude will write code using this library instead of direct MCP calls.

See claude-skill/SKILL.md for complete skill documentation.

API Reference

SearchOrchestrator

Main class for automated multi-strategy searching.

comprehensive_search(query, max_results=20, use_semantic=True, use_keyword=True, use_tags=True, search_limit_per_strategy=50)

Performs comprehensive search with automatic deduplication and ranking.

Returns: List of ZoteroItem objects

filter_by_criteria(items, item_types=None, date_range=None, required_tags=None, excluded_tags=None)

Filter items by various criteria.

Returns: Filtered list of ZoteroItem objects

ZoteroLibrary

Low-level interface to Zotero.

  • search_items(query, ...) - Keyword search
  • semantic_search(query, ...) - Semantic/vector search
  • search_by_tag(tags, ...) - Tag-based search
  • get_recent(limit) - Recently added items
  • get_tags() - All library tags

Helper Functions

  • format_results(items, include_abstracts=True, max_abstract_length=300) - Format as markdown

See README_LIBRARY.md for complete API documentation.

Architecture

Based on Anthropic's code execution with MCP:

  1. Claude writes Python code (not direct MCP calls)
  2. Code fetches large datasets (100+ items) from Zotero
  3. Code processes in execution environment (dedup, rank, filter)
  4. Only filtered results return to LLM context (20 items)

Result: Large datasets stay out of context, preventing crashes and saving tokens.

Performance

Expected Benefits

Based on Anthropic's pattern and implementation design:

  • Token reduction: 50-90% (exact amount depends on search size)
  • Function calls: 5-10x → 1x (confirmed by design)
  • Search limits: 10-15 → 100+ items (safe in code)
  • Crash prevention: Likely effective (untested)

Status

⚠️ Proof of concept - Performance claims are theoretical projections, not measured results.

See HONEST_STATUS.md for detailed status and validation needs.

Documentation

Contributing

Contributions welcome! Areas for improvement:

  1. Performance validation - Measure actual token savings
  2. Better ranking - Incorporate semantic similarity scores
  3. Caching - Cache search results with invalidation
  4. Parallel processing - Execute search strategies concurrently
  5. Export functions - Batch BibTeX generation, CSV export

License

MIT License - see LICENSE file for details.

Credits

Related Projects

Citation

If you use this in research, please cite:

@software{zotero_code_execution,
  title = {Zotero Code Execution: Efficient Multi-Strategy Search},
  year = {2025},
  url = {https://github.com/kerim/zotero-code-execution}
}

About

Efficient multi-strategy Zotero search using code execution pattern

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages