-
-
Notifications
You must be signed in to change notification settings - Fork 324
Open
Labels
enhancementNew feature or requestNew feature or requestfuture planningIdeas or features proposed for future development.Ideas or features proposed for future development.
Description
It would be useful to expose Pydoll as an HTTP service, so external systems can trigger crawls without writing Python code. The idea is to add a CLI command:
pydoll serve --port 8000This spins up a lightweight server (likely as a plugin, to avoid bloating the core) and exposes a simple API.
Proposed API
Initial endpoint:
- POST /crawl → body contains { "url": "https://example.com", "format": "html" | "markdown" }
- Response returns the page content, either as HTML or Markdown (depending on the Markdown exporter feature).
This endpoint becomes a foundation for LLM integrations, where the returned HTML or Markdown can be fed into models for structured data extraction. By exposing crawling as a simple web API, Pydoll can be plugged directly into AI pipelines, data labeling flows, or automated extraction systems without extra glue code.
This could start as a separate repository (pydoll-serve) and evolve independently, but integrating a CLI hook into Pydoll keeps the DX simple.
LucasAlvws, rjumko and nirizr
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or requestfuture planningIdeas or features proposed for future development.Ideas or features proposed for future development.