Velohost Velohost

Sitemap support

Sitemap Extractor FAQs

Clear explanations of how XML sitemaps work, how sitemap index files are resolved, and how sitemap data is used for SEO and crawling.

What is an XML sitemap?

An XML sitemap is a structured file that lists the URLs of a website to help search engines discover, crawl, and prioritise pages more efficiently.

What does a sitemap extractor do?

A sitemap extractor reads XML sitemap files and lists all URLs they contain, including URLs found inside sitemap index files.

What is the difference between a sitemap and a sitemap index?

A sitemap contains URLs directly, while a sitemap index references multiple sitemap files. Large websites often use indexes to organise many sitemaps.

Does the sitemap extractor support sitemap index files?

Yes. Sitemap index files are automatically resolved and all referenced sitemaps are fetched and processed.

Why must the sitemap URL include https://?

The sitemap URL must include the full protocol so it can be fetched correctly. For example, https://example.com/sitemap.xml.

Can a sitemap be served over HTTP?

Yes, but HTTPS is strongly recommended. Search engines prioritise secure sites, and HTTPS prevents interception or modification of sitemap data.

How many URLs can a sitemap contain?

A single sitemap can contain up to 50,000 URLs or be 50MB uncompressed. Larger sites must split URLs across multiple sitemaps using an index file.

What optional fields can sitemaps include?

Sitemaps may include metadata such as lastmod, changefreq, and priority, though search engines treat these as hints rather than strict rules.

Does submitting a sitemap guarantee indexing?

No. Sitemaps help discovery and crawling, but search engines decide whether to index pages based on quality, relevance, and site signals.

How are sitemaps used for SEO audits?

Sitemaps are used to identify crawlable URLs, detect orphaned pages, verify canonical coverage, and compare indexed pages against intended site structure.

Can sitemaps contain URLs not linked internally?

Yes. Sitemaps can list URLs that are not reachable via internal links, which is useful for discovering orphaned or hidden pages.

Can sitemaps be generated dynamically?

Yes. Many modern sites generate sitemaps dynamically to reflect real-time content changes, product availability, or pagination.

Can a site have multiple sitemaps?

Yes. Sites often split sitemaps by content type, language, or section, all referenced from a single sitemap index.

Do sitemaps need to be submitted to search engines?

Submitting sitemaps via tools like Google Search Console improves discovery, but search engines can also find sitemaps automatically.

Do sitemaps override robots.txt?

No. URLs blocked by robots.txt or marked noindex may still appear in sitemaps but will not be crawled or indexed.

Why might a sitemap fail to load?

Failures can occur due to incorrect URLs, invalid XML syntax, server errors, redirects, authentication requirements, or blocked access.

Can sitemaps be compressed?

Yes. Sitemaps can be compressed using gzip (.gz) to reduce bandwidth usage while remaining fully supported by search engines.

Does the sitemap extractor store sitemap data?

No. Sitemap URLs are processed live and contents are not stored, logged, or tracked by Velohost.

Is exposing a sitemap a security risk?

Sitemaps only expose publicly accessible URLs. Sensitive or private content should never be included in a sitemap.

What are sitemap best practices?

Best practices include using HTTPS, keeping sitemaps updated, excluding noindex URLs, splitting large sitemaps, and monitoring coverage in search tools.

Want to try it yourself? Use the sitemap extractor or Check DNS configuration

Ready to extract URLs from a sitemap?