HTML to Markdown Converter: Bulk Web Pages to Clean MD, $1 per 1,000 Pages

Convert any HTML page to clean, LLM-ready Markdown. Strips chrome (navs, ads, sidebars), preserves headings, tables, fenced code blocks, images with alt text, and links. Returns the page title, primary content as Markdown, word count, extracted image and link arrays, and the inferred main URL. Built for batch: feed it 10,000 article URLs and it returns one row per page. Perfect for LLM training corpora, RAG ingestion, documentation mirrors, and content monitoring.

Open on Apify → Try it now
Pricing
$0.001/page
RAM
128MB
Coverage
Any URL
Output fields
10+
Proxy
Apify datacenter
Tech
HTTP + Readability

What you get

Primary use cases

API example

# Start a run via the Apify API
curl -X POST "https://api.apify.com/v2/acts/santamaria-automations~html-to-markdown/runs?token=YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "urls": [
      "https://blog.example.com/post-1",
      "https://docs.example.com/getting-started",
      "https://news.example.com/article-2026"
    ],
    "extractImages": true,
    "extractLinks": true,
    "mainContentOnly": true
  }'

# Or use with AI agents via MCP:
# https://mcp.apify.com?tools=santamaria-automations/html-to-markdown

Integrations

Output fields

FieldTypeExample
source_urlstringhttps://blog.example.com/post-1
titlestringBuilding RAG Pipelines
main_contentstring# Building RAG Pipelines\n\nA practical guide...
word_countinteger1,842
reading_time_minutesinteger8
languagestringen
canonical_urlstringhttps://blog.example.com/post-1
imagesarray[{"src":"...","alt":"diagram"}]
linksarray[{"href":"...","text":"docs"}]
scraped_atstring2026-06-13T10:15:42Z

Related Actors

Open on Apify → Try it now (free tier available)