PDF Text Extractor: Bulk PDF to Text and Metadata, $1 per 1,000 PDFs

Pull plain text, page-by-page content, and full metadata from any PDF URL. Returns title, author, creation date, page count, character count, and flags for scanned (image-only) or encrypted documents. Built for bulk: pass it 10,000 URLs and it returns structured rows. Ideal for legal discovery, RAG ingestion, compliance audits, and document workflow automation.

Open on Apify → Try it now
Pricing
$0.001/PDF
RAM
128MB
Coverage
Any URL
Output fields
12+
Proxy
Apify datacenter
Tech
HTTP + pdfcpu

What you get

Primary use cases

API example

# Start a run via the Apify API
curl -X POST "https://api.apify.com/v2/acts/santamaria-automations~pdf-extractor/runs?token=YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "pdfUrls": [
      "https://example.com/report-2026.pdf",
      "https://example.com/contract-v3.pdf",
      "https://example.com/whitepaper.pdf"
    ],
    "extractText": true,
    "extractMetadata": true,
    "perPageText": false
  }'

# Or use with AI agents via MCP:
# https://mcp.apify.com?tools=santamaria-automations/pdf-extractor

Integrations

Output fields

FieldTypeExample
source_urlstringhttps://example.com/report.pdf
titlestringAnnual Report 2026
authorstringAcme Corp
page_countinteger142
char_countinteger284,512
textstringExecutive Summary...
creation_datestring2026-01-15T09:30:00Z
is_scannedbooleanfalse
is_encryptedbooleanfalse
file_size_bytesinteger4,218,940

Related Actors

Open on Apify → Try it now (free tier available)