Agent Skills
pdfvision ships Agent Skills in skills/pdfvision/. They teach a skill-aware agent when to call the CLI, which flags to try first, and when to escalate from native text to layout, rendering, OCR, or visual-region crops.
This matters because PDF work is rarely solved by one fixed command. A useful agent should inspect the first result, notice missing or suspicious evidence, and choose the next pdfvision pass. The bundled skill encodes that workflow so agent sessions do not have to rediscover it.
Install
npx skills add yamadashy/pdfvisionFor a global install:
npx skills add yamadashy/pdfvision -gWhat Agent Skills Cover
The Agent Skills cover:
- default extraction for readable PDFs.
- density-signal checks for silent failures.
- when to add
--layout,--render,--ocr,--image-boxes, or--visual-regions. - when to use
--searchand--render-regionfor evidence-focused crops. - structured output reference routing.
- OCR language and traineddata troubleshooting.
The Agent Skills intentionally keep their main instructions short and point to references only when the task needs them.
Agent Workflow
A skill-aware agent should usually:
- Start with a structured extraction.
- Inspect overview fields, page quality, and warnings.
- Add layout or visual boxes when placement matters.
- Search for exact evidence when the user asks about a specific clause, metric, label, or field value.
- Render pages or regions only when visual verification is needed.
- Use OCR when native text is missing, sparse, or visibly contradicted.
That keeps the interaction efficient while still giving the agent the option to look at the PDF like a human reader.
When to Install It
Install the Agent Skills in projects where agents often read PDFs, reports, slide decks, forms, or scanned documents. They are especially useful in repositories that already use Claude Code, Codex, Cursor, or other skill-aware agent environments.