Plugin RAG

API endpoints that extract and transform unstructured data for agentic AI.

Endpoints work as standalone or in combination.

Build using the tools you know.

Join the Beta Read the Docs

A diagram showing how BookWyrm fits into a RAG pipeline

The Developer Story

You've got data everywhere - PDFs from customer surveys, market reports in shared drives, research papers, and more. You want to build an agent that can actually use this information, but first you need to wrangle it into something usable.

Step 1: Classify your documents

Start by figuring out what kind of files you’re dealing with. With BookWyrm, it’s one command:

bookwyrm classify --file customer-satisfaction.pdf --output satisfaction.json

Now you know exactly what you're working with, ready for embedding and indexing in your favorite vector database (e.g. Pinecone).

Step 2: Chunk your text the smart way

Instead of splitting text at arbitrary token lengths, BookWyrm uses phrasal models to break content into meaningful units:

bookwyrm phrasal --file satisfaction.txt --format with_offsets --output phrases.jsonl

Step 3: Summarize for clarity

Huge documents? No problem. Generate summaries that make it easier to embed, search, or serve to users:

bookwyrm summarize phrases.jsonl --output summary.json

Step 4: Ground answers with citations

When your agent answers a question, you want sources you can trust. BookWyrm's citation endpoint finds and justifies them:

bookwyrm cite "What are the outcomes from the customer satisfaction survey?" phrases.jsonl --output results.json

In just a few steps, you've built a RAG pipeline: documents classified → text chunked → content summarized → citations retrieved.

You have not needed to test models, touch regex, or duplicate tasks for different file types. BookWyrm handled the grunt work, letting you focus on building agents and applications that deliver reliable results.

BookWyrm Delivers Your Agentic Workflows Strategy.

Your data pipeline is the foundation for your agentic workflows. Build it right. Get started with the API that's fast to set up, easy to extend, and built for developers.

Join the Beta Book A Design Session