Search and Region Zoom

pdfvision can find text evidence first, then render only the matching region. This is useful when an agent needs to verify a clause, table cell, figure label, form value, or OCR result without sending a full page image to a vision model.

This is one of the most agent-friendly workflows in pdfvision: use text search as a cheap locator, then switch to visual evidence only where it matters.

Search a PDF

bash

pdfvision report.pdf --search "revenue" --json

Matches are emitted in pages[].matches[]. Each match includes the page number, query, source, text snippet, and a bounding box when pdfvision can locate the visible area.

Repeat --search to run multiple queries in one pass:

bash

pdfvision paper.pdf --search "transformer" --search "attention" --json

By default, search is literal, case-insensitive, and NFKC-aware. Add regex or exact-case matching only when the task needs it:

bash

pdfvision report.pdf --search "Q[1-4] revenue" --search-regex --json
pdfvision report.pdf --search "PDF" --search-case-sensitive --json

Good search targets include:

contract clauses and policy terms.
financial metric labels.
table row names.
form values.
figure captions and chart labels.
OCR text on scanned pages.
multilingual terms whose Unicode form may vary.

What Search Covers

Search can match:

native PDF text.
text and choice values from --form-fields.
visible FreeText annotation contents from --annotations.
OCR text from --ocr, using OCR word boxes when available.

OCR matches that duplicate native, form-field, or annotation matches are suppressed so agents do not see the same visible text twice.

The match source helps the agent decide how much to trust it:

native: text came from the PDF text layer.
formField: text came from a visible widget value or display value.
annotation: text came from a visible FreeText annotation.
ocr: text came from page pixels and may need confidence review.

For multi-query searches, queryIndex lets the caller map each hit back to the repeated --search flag that produced it.

Render the Matching Region

Take a match bbox and pass it to --render-region:

bash

pdfvision report.pdf --pages 3 --render --render-region 120,180,360,140 --render-output ./crops --json

--render-region requires exactly one selected page. The region uses PDF points with a top-left origin, and it must stay within the page bounds.

Use --render-scale when the crop contains small labels, superscripts, dense table cells, or chart legends:

bash

pdfvision report.pdf --pages 3 --render --render-region 120,180,360,140 --render-scale 3 --render-output ./crops --json

For best crops, add padding around a match bbox before passing it to --render-region. A little surrounding context helps vision models read labels, row headers, and nearby explanatory text.

Agent Workflow

Run --search to locate candidate evidence.
Inspect pages[].matches[] and choose the bbox with the right source and page.
Re-run with --pages, --render, and --render-region for a visual crop.
Ask the vision model to compare the crop against the native text, OCR text, or extracted table data.

For visual regions that are not text-searchable, use Rendering and OCR with --visual-regions or --render-visual-regions.

Example: Auditable Claim Check

bash

pdfvision annual-report.pdf --search "Net sales" --search "Operating income" --layout --json

An agent can inspect pages[].matches[], choose the hit with the right page and surrounding context, then request a crop:

bash

pdfvision annual-report.pdf --pages 42 --render --render-region 72,180,468,180 --render-output ./evidence --json

The final answer can cite both extracted text and the rendered evidence region.

Search and Region Zoom ​

Search a PDF ​

What Search Covers ​

Render the Matching Region ​

Agent Workflow ​

Example: Auditable Claim Check ​

Search and Region Zoom

Search a PDF

What Search Covers

Render the Matching Region

Agent Workflow

Example: Auditable Claim Check