Plugin RAG
API endpoints that extract and transform unstructured data for agentic AI.
Endpoints work as standalone or in combination.
Build using the tools you know.
The Developer Story
You've got data everywhere - PDFs from customer surveys, market reports in shared drives, research papers, and more. You want to build an agent that can actually use this information, but first you need to wrangle it into something usable.
Step 1: Classify your documents
Start by figuring out what kind of files you’re dealing with. With BookWyrm, it’s one command:
bookwyrm classify --file customer-satisfaction.pdf --output satisfaction.json
Now you know exactly what you're working with, ready for embedding and indexing in your favorite vector database (e.g. Pinecone).
Step 2: Chunk your text the smart way
Instead of splitting text at arbitrary token lengths, BookWyrm uses phrasal models to break content into meaningful units:
bookwyrm phrasal --file satisfaction.txt --format with_offsets --output phrases.jsonl
Step 3: Summarize for clarity
Huge documents? No problem. Generate summaries that make it easier to embed, search, or serve to users:
bookwyrm summarize phrases.jsonl --output summary.json
Step 4: Ground answers with citations
When your agent answers a question, you want sources you can trust. BookWyrm's citation endpoint finds and justifies them:
bookwyrm cite "What are the outcomes from the customer satisfaction survey?" phrases.jsonl --output results.json
In just a few steps, you've built a RAG pipeline: documents classified → text chunked → content summarized → citations retrieved.
You have not needed to test models, touch regex, or duplicate tasks for different file types. BookWyrm handled the grunt work, letting you focus on building agents and applications that deliver reliable results.