Bookwyrm Logo  RAG for Unstructured Data
{
  "query": "climate change impacts on agriculture",
  "pdf_path": "research_paper.pdf",
  "total_citations": 12,
  "citations": [
    {
      "text": "Climate change has significantly reduced corn yields in the Midwest...",
      "reasoning": "This directly addresses agricultural impacts of climate change",
      "quality": 4,
      "llm_score": 5,
      "llm_explanation": "5 - Comprehensive account that fully addresses the query",
      "start_page": 15,
      "end_page": 15
    }
  ],...
}

The problem: AI in research needs verifiable grounding

In research, AI can process more information than any human.

But without clear citations, the output can't be trusted.

Most LLM pipelines summarize or paraphrase text without linking back to the original source, leading to hallucinations and unverifiable claims, unacceptable in academic contexts.

Researchers need systems that accelerate reading and referencing, not ones that add another layer of verification work.

Picture of an old library with lots of books on shelves.

BookWyrm for researchers

BookWyrm's cite endpoint was designed for “citation under question”, a process where an AI reads a document deeply with a specific research query in mind, surfacing only direct excerpts with position data.

Instead of inventing text, the model retrieves and ranks real passages from the source, allowing researchers to trace every claim back to its origin.

The response includes:

  • Exact text excerpts
  • Page and position metadata
  • Citation reasoning context

This allows you to verify and filter supporting evidence before passing it to research assistants or domain experts.

# Analyze a document for climate change impacts
uv run citation-workflow research_paper.pdf query_climate.txt --output results.json
Describe any mentions of climate change impacts on agriculture, including changes in crop yields, farming practices, or agricultural adaptation strategies.
{
  "query": "climate change impacts on agriculture",
  "pdf_path": "research_paper.pdf",
  "total_citations": 12,
  "citations": [
    {
      "text": "Climate change has significantly reduced corn yields in the Midwest...",
      "reasoning": "This directly addresses agricultural impacts of climate change",
      "quality": 4,
      "llm_score": 5,
      "llm_explanation": "5 - Comprehensive account that fully addresses the query",
      "start_page": 15,
      "end_page": 15
    }
  ],...
}

A historical research example

When working with Dr. Peter Turchin and the Complexity Science Hub in Vienna, the challenge was verifying AI-identified historical events from long, complex documents, often 400+ pages.

Their pipeline used AI to suggest potential events but relied entirely on human review for validation.

BookWyrm's citation process is able to automate the “deep read” step:

  1. A small, fast LLM scans documents for candidate excerpts.
  2. A larger model reviews and filters the results for accuracy.
  3. Researchers receive exact quotes with page references.

The entire pipeline reduced multi-hour verification work to a few minutes, with the AI producing reviewable, source-backed evidence.

📕 Read More: AI Driven Citation: Controlling Hallucinations With Concrete Sources

A robot dragon reading and taking notes on a book.

A citation workflow that scales

Using BookWyrm's API, you can:

This architecture supports multi-model setups where lightweight LLMs perform the initial scan and more powerful ones perform semantic validation, all grounded by BookWyrm's position-aware extraction.

Benefits of adding BookWyrm to your research pipeline

Accelerate your research

BookWyrm transforms hundreds of pages into reviewable evidence, cutting manual reading time and reducing verification load.

Join the beta and start building AI-driven research tools that cite their sources.

Join the Beta

Sign up and receive €20 in free credits.

The first 100 sign-ups also get a €100 top-up after initial usage (valid until 1 November 2025).

Picture of an old library with lots of books on shelves.