Parse robots.txt, sitemap.xml, and nested sitemap indexes to discover every published URL on any website. Returns each URL with its last-modified date, change frequency, and priority. Supports gzipped sitemaps and sitemap-index chains. Built for bulk: pass a list of domains and it returns the full URL inventory for each. Ideal for SEO audits, content monitoring, competitive intelligence, and pre-crawl URL discovery.
# Start a run via the Apify API curl -X POST "https://api.apify.com/v2/acts/santamaria-automations~sitemap-url-discovery/runs?token=YOUR_TOKEN" \ -H "Content-Type: application/json" \ -d '{ "domains": [ "https://example.com", "https://competitor.com", "https://news-site.com" ], "maxUrlsPerSite": 10000, "followSitemapIndexes": true, "parseRobotsTxt": true }' # Or use with AI agents via MCP: # https://mcp.apify.com?tools=santamaria-automations/sitemap-url-discovery
| Field | Type | Example |
|---|---|---|
| source_domain | string | example.com |
| sitemap_url | string | https://example.com/sitemap.xml |
| total_urls | integer | 12,486 |
| url | string | https://example.com/blog/post-1 |
| lastmod | string | 2026-06-10T14:00:00Z |
| changefreq | string | weekly |
| priority | number | 0.8 |
| robots_sitemaps | array | ["https://example.com/sitemap.xml"] |
| is_gzipped | boolean | false |
| scraped_at | string | 2026-06-13T10:15:42Z |