Search and Region Zoom
pdfvision can find text evidence first, then render only the matching region. This is useful when an agent needs to verify a clause, table cell, figure label, form value, or OCR result without sending a full page image to a vision model.
This is one of the most agent-friendly workflows in pdfvision: use text search as a cheap locator, then switch to visual evidence only where it matters.
Search a PDF
pdfvision report.pdf --search "revenue" --jsonMatches are emitted in pages[].matches[]. Each match includes the page number, query, source, text snippet, and a bounding box when pdfvision can locate the visible area.
Repeat --search to run multiple queries in one pass:
pdfvision paper.pdf --search "transformer" --search "attention" --jsonBy default, search is literal, case-insensitive, and NFKC-aware. Add regex or exact-case matching only when the task needs it:
pdfvision report.pdf --search "Q[1-4] revenue" --search-regex --json
pdfvision report.pdf --search "PDF" --search-case-sensitive --jsonGood search targets include:
- contract clauses and policy terms.
- financial metric labels.
- table row names.
- form values.
- figure captions and chart labels.
- OCR text on scanned pages.
- multilingual terms whose Unicode form may vary.
What Search Covers
Search can match:
- native PDF text.
- text and choice values from
--form-fields. - visible FreeText annotation contents from
--annotations. - OCR text from
--ocr, using OCR word boxes when available.
OCR matches that duplicate native, form-field, or annotation matches are suppressed so agents do not see the same visible text twice.
The match source helps the agent decide how much to trust it:
native: text came from the PDF text layer.formField: text came from a visible widget value or display value.annotation: text came from a visible FreeText annotation.ocr: text came from page pixels and may need confidence review.
For multi-query searches, queryIndex lets the caller map each hit back to the repeated --search flag that produced it.
Render the Matching Region
Take a match bbox and pass it to --render-region:
pdfvision report.pdf --pages 3 --render --render-region 120,180,360,140 --render-output ./crops --json--render-region requires exactly one selected page. The region uses PDF points with a top-left origin, and it must stay within the page bounds.
Use --render-scale when the crop contains small labels, superscripts, dense table cells, or chart legends:
pdfvision report.pdf --pages 3 --render --render-region 120,180,360,140 --render-scale 3 --render-output ./crops --jsonFor best crops, add padding around a match bbox before passing it to --render-region. A little surrounding context helps vision models read labels, row headers, and nearby explanatory text.
Agent Workflow
- Run
--searchto locate candidate evidence. - Inspect
pages[].matches[]and choose the bbox with the right source and page. - Re-run with
--pages,--render, and--render-regionfor a visual crop. - Ask the vision model to compare the crop against the native text, OCR text, or extracted table data.
For visual regions that are not text-searchable, use Rendering and OCR with --visual-regions or --render-visual-regions.
Example: Auditable Claim Check
pdfvision annual-report.pdf --search "Net sales" --search "Operating income" --layout --jsonAn agent can inspect pages[].matches[], choose the hit with the right page and surrounding context, then request a crop:
pdfvision annual-report.pdf --pages 42 --render --render-region 72,180,468,180 --render-output ./evidence --jsonThe final answer can cite both extracted text and the rendered evidence region.