What is an XML sitemap?
An XML sitemap is a structured file that lists the URLs of a website to help search engines discover, crawl, and prioritise pages more efficiently.
Sitemap support
Clear explanations of how XML sitemaps work, how sitemap index files are resolved, and how sitemap data is used for SEO and crawling.
An XML sitemap is a structured file that lists the URLs of a website to help search engines discover, crawl, and prioritise pages more efficiently.
A sitemap extractor reads XML sitemap files and lists all URLs they contain, including URLs found inside sitemap index files.
A sitemap contains URLs directly, while a sitemap index references multiple sitemap files. Large websites often use indexes to organise many sitemaps.
Yes. Sitemap index files are automatically resolved and all referenced sitemaps are fetched and processed.
The sitemap URL must include the full protocol so it can be fetched correctly. For example, https://example.com/sitemap.xml.
Yes, but HTTPS is strongly recommended. Search engines prioritise secure sites, and HTTPS prevents interception or modification of sitemap data.
A single sitemap can contain up to 50,000 URLs or be 50MB uncompressed. Larger sites must split URLs across multiple sitemaps using an index file.
Sitemaps may include metadata such as lastmod, changefreq, and priority, though search engines treat these as hints rather than strict rules.
No. Sitemaps help discovery and crawling, but search engines decide whether to index pages based on quality, relevance, and site signals.
Sitemaps are used to identify crawlable URLs, detect orphaned pages, verify canonical coverage, and compare indexed pages against intended site structure.
Yes. Sitemaps can list URLs that are not reachable via internal links, which is useful for discovering orphaned or hidden pages.
Yes. Many modern sites generate sitemaps dynamically to reflect real-time content changes, product availability, or pagination.
Yes. Sites often split sitemaps by content type, language, or section, all referenced from a single sitemap index.
Submitting sitemaps via tools like Google Search Console improves discovery, but search engines can also find sitemaps automatically.
No. URLs blocked by robots.txt or marked noindex may still appear in sitemaps but will not be crawled or indexed.
Failures can occur due to incorrect URLs, invalid XML syntax, server errors, redirects, authentication requirements, or blocked access.
Yes. Sitemaps can be compressed using gzip (.gz) to reduce bandwidth usage while remaining fully supported by search engines.
No. Sitemap URLs are processed live and contents are not stored, logged, or tracked by Velohost.
Sitemaps only expose publicly accessible URLs. Sensitive or private content should never be included in a sitemap.
Best practices include using HTTPS, keeping sitemaps updated, excluding noindex URLs, splitting large sitemaps, and monitoring coverage in search tools.
Want to try it yourself? Use the sitemap extractor or Check DNS configuration
Ready to extract URLs from a sitemap?