API endpoints that extract and transform unstructured data for agentic AI.
Endpoints work as standalone or in combination.
Build using the tools you know.

You've got data everywhere - PDFs from customer surveys, market reports in shared drives, research papers, and more. You want to build an agent that can actually use this information, but first you need to wrangle it into something usable.
Start by figuring out what kind of files you’re dealing with. With BookWyrm, it’s one command:
bookwyrm classify --file customer-satisfaction.pdf --output satisfaction.jsonNow you know exactly what you're working with, ready for embedding and indexing in your favorite vector database (e.g. Pinecone).
Instead of splitting text at arbitrary token lengths, BookWyrm uses phrasal models to break content into meaningful units:
bookwyrm phrasal --file satisfaction.txt --format with_offsets --output phrases.jsonlHuge documents? No problem. Generate summaries that make it easier to embed, search, or serve to users:
bookwyrm summarize phrases.jsonl --output summary.jsonWhen your agent answers a question, you want sources you can trust. BookWyrm's citation endpoint finds and justifies them:
bookwyrm cite "What are the outcomes from the customer satisfaction survey?" phrases.jsonl --output results.jsonIn just a few steps, you've built a RAG pipeline: documents classified → text chunked → content summarized → citations retrieved.
You have not needed to test models, touch regex, or duplicate tasks for different file types. BookWyrm handled the grunt work, letting you focus on building agents and applications that deliver reliable results.
Your data pipeline is the foundation for your agentic workflows. Build it right. Get started with the API that's fast to set up, easy to extend, and built for developers.