Endpoint Overview

A Flexible API for AI Workflows

We provide fully managed, production-ready API endpoints that solve specific, complex developer problems for building AI workflows right out of the box.

Start by extracting and processing text from documents, then use the endpoints generate structured output or produce answers grounded in your data.

Extract from PDF/Excel/CSV

# CLIbookwyrm extract-pdf document.pdf --output extracted.json

The Problem

Low-quality text extraction from native/image PDFs that breaks context and requires AI data cleaning.

Core Value

High-fidelity text output: Extract PDF text with Python - extracts clean, non-messy text from any PDF through unstructured data extraction. Includes crucial position information (e.g., page and coordinates) for advanced retrieval and UI highlighting.

Text Processing (Chunking)

# CLIbookwyrm phrasal --file document.txt --format with_offsets --output phrases.jsonl

The Problem

Generic, token-based splitting that breaks semantic meaning, leading to poor retrieval quality.

Core Value

Context-aware chunking: Semantic chunking for RAG - splits documents into meaningful, context-aware chunks and phrases instead of arbitrary token windows. Allows configurable sizing to fit your retrieval needs.

Structured Output (Summarization)

# CLIbookwyrm summarize data/country-of-the-blind-phrases.jsonl 
  --model-class-file data/summary.py   
  --model-class-name Summary   
  --model-strength smart   
  --output data/country-structured-summary.json   
  --verbose

The Problem

Obtaining reliable, accurate structured output for AI workflows and AI task automation takes time and resource.

Core Value

Generate user specified json outputs from any document: Need to process invoices, receipts, vacation requests, whatever you want to automate, create a Pynadic model class and use it with the Summarize endpoint to get structured JSON from any document in a few lines of code.

Citation (Deep Reader)

# CLIbookwyrm cite "What is the main theme?" --url https://example.com/chunks.jsonl

The Problem

AI hallucination, lack of trust, and an inability to trace the source of an answer.

Core Value

Source-grounded AI & traceability: Ask a question against your chunks and get back citations with the original reasoning context. Build source-grounded AI by grounding your RAG answers in the source material for total control and trust.

Summarization

# CLIbookwyrm summarize phrases.jsonl --output summary.json

The Problem

Feeding large, noisy blocks of text into agents or models, consuming unnecessary tokens and budget.

Core Value

Concise, embeddable text: Collapse long or noisy text into concise summaries that are easier to embed, search, or feed into agents for efficient context injection.

Clasify

# CLIbookwyrm classify --file document.pdf --output results.json

The Problem

Difficulty routing documents correctly for different agentic pipelines, indexing, or processing logic.

Core Value

Intelligent routing: Intelligently classify documents or raw text by format, type, and structure for efficient downstream routing and indexing.

Streaming

# CLIbookwyrm cite "AI applications" chunks.jsonl --stream -v

The Problem

Dealing with long processing times in live applications, which degrades user experience.

Core Value

Real-time progress: Real-time processing with progress updates for all major operations, ideal for live applications where you need to surface results quickly.

Each endpoint is standalone, so you can slot BookWyrm into an existing AI data pipeline or stitch multiple pieces together for your AI workflows.

These capabilities work together to provide a complete AI data pipeline for document ingestion, AI data processing, and retrieval - the foundation of any RAG system and AI workflows.

Build AI Workflows Without Pre-Processing Pain

A Flexible API for AI Workflows

Extract from PDF/Excel/CSV

The Problem

Core Value

Text Processing (Chunking)

The Problem

Core Value

Structured Output (Summarization)

The Problem

Core Value

Citation (Deep Reader)

The Problem

Core Value

Summarization

The Problem

Core Value

Clasify

The Problem

Core Value

Streaming

The Problem

Core Value

Want to see BookWyrm in action?

From Raw Documents to Intelligent Applications

Start by Processing your Documents

1. Extract PDF Text with Python

2. Context-Aware Chunking for Semantic Chunking for RAG

3. Then Build Intelligence on Top

Invoice Processing & Workflow Automation

Factual, Citable Knowledge Agent

Data Enrichment & Enhancement

Why This Pipeline Works

A Logical Developer Experience

Drop-in Integration

Developer-Friendly Tooling

Production-Ready from Day One

Let's Co-Design Your First Agentic Pipeline. (For Free.)

Security & Privacy

BookWyrm Delivers Your Agentic Workflows Strategy.