Try the XML Sitemap Generator

Sitemap Indexation: Why Google Doesn't Index All Your Submitted URLs

A low sitemap indexation ratio is Google giving you feedback. Here's what the common exclusion reasons (crawled-not-indexed, duplicate canonicals, 404s) actually mean, how crawl budget works for large sites, and how to improve what gets indexed.

By sadiqbd Β· June 9, 2026

Sitemap Indexation: Why Google Doesn't Index All Your Submitted URLs

A sitemap with 500 URLs where only 200 are indexed is a signal, not a problem to ignore

The indexation ratio β€” submitted URLs versus indexed URLs in Google Search Console β€” is one of the most informative and underanalysed metrics in technical SEO. A 40% indexation rate on a well-maintained site isn't just a gap in coverage. It's Google telling you something about how it evaluates your content.

Understanding why URLs aren't indexed, what Google's exclusion reasons mean, and how to improve the ratio transforms the sitemap from a submission mechanism into a diagnostic tool.


The gap between submitted and indexed

Google Search Console's Index Coverage report (now called Pages in some interfaces) shows:

  • Indexed pages: URLs Google has crawled, processed, and included in its search index
  • Submitted in sitemap (not indexed): URLs in your sitemap that Google has chosen not to include in the index
  • Excluded pages: URLs Google found but chose not to index, with reasons

The sitemap doesn't compel indexation β€” it requests it. Google still applies quality, uniqueness, and relevance criteria before indexing each URL.


Why URLs get excluded: the common reasons

"Crawled - currently not indexed"

The most ambiguous exclusion. Google crawled the page but decided not to index it. Usually indicates:

  • Thin content: the page doesn't contain enough unique, useful information to warrant indexation. A product category page with only 3 products, or a tag archive with 1 post.
  • Near-duplicate content: the page is too similar to other indexed pages. Multiple filtered product pages with minimal content differences.
  • Low quality signals: the page exists but has no external links, no user engagement history, and thin content β€” Google deprioritises it.

Fix: either improve the content quality to make indexation worthwhile, or add noindex if the page genuinely doesn't need to be indexed (tag archives, filter combinations, etc.).

"Duplicate, Google chose different canonical than user"

You specified a canonical URL pointing to this page, but Google disagreed and chose a different canonical β€” usually because it found another URL it considers more authoritative.

This often indicates:

  • Two pages with substantially similar content where Google chose the one with more backlinks
  • Canonical pointing to a URL with lower perceived authority than the alternative
  • Inconsistent internal linking undermining your canonical signal

Fix: consolidate similar content, strengthen internal links to your preferred canonical, and ensure the canonical tag is on all variants (not just the preferred version).

"Alternate page with proper canonical tag"

This is actually expected and correct β€” a URL that's a non-canonical variant (parameter URL, www/non-www, HTTP/HTTPS) has a canonical pointing elsewhere, and Google is respecting it. This exclusion reason is usually fine.

"Page with redirect"

A URL in your sitemap returns a redirect. Sitemaps should only contain URLs that return 200 status directly β€” never redirect targets that themselves redirect, and never the source of a redirect.

Fix: update the sitemap to contain only the final destination URL.

"Blocked by robots.txt"

A URL in your sitemap is blocked from crawling by your robots.txt file. This is a contradiction β€” if you're telling Google about a URL in the sitemap but blocking it in robots.txt, the intended state is unclear.

Fix: decide whether you want the URL indexed (remove the robots.txt block) or not (remove it from the sitemap and add noindex).

"Not found (404)"

A URL in your sitemap returns 404. Either the page was deleted without removing it from the sitemap, or the URL format changed.

Fix: update the sitemap to remove the 404 URLs, and investigate whether redirects should be in place for URLs that had external links.


Crawl budget and why it matters for large sites

Google allocates a "crawl budget" to each site β€” a limit on how many pages it will crawl per day. For sites with millions of pages, not everything gets crawled frequently.

What affects crawl budget:

  • Domain authority and link profile (higher authority β†’ more crawl budget)
  • Server response time (slow servers waste crawl budget on wait time)
  • Crawl errors (4xx and 5xx responses waste crawl budget)
  • Robots.txt blocking (crawlers don't waste time on blocked paths)
  • Sitemap quality (a sitemap with URLs that frequently return errors trains Googlebot to expect unreliable URLs)

For most sites under 10,000 pages, crawl budget is not a constraint. For sites with tens of thousands of pages, improving crawl efficiency becomes important.


Sitemap best practices for maximising indexation

Only include URLs that should be indexed. Remove:

  • noindex pages (they shouldn't be in the sitemap if you don't want them indexed)
  • Redirect URLs (use the final destination)
  • 4xx URLs (obviously)
  • Paginated pages beyond page 1 in most cases
  • Thin content pages you're working on improving

Keep <lastmod> accurate. If every URL in your sitemap shows today's date as <lastmod> regardless of when they were actually updated, Google stops trusting your <lastmod> values. Accurate lastmod helps Google prioritise recently changed content for recrawl.

Use sitemap index files for large sites. One sitemap per content type (blog posts, product pages, category pages) makes it easier to analyse indexation rates by type and identify where quality problems are concentrated.

Submit to Bing too. Google Search Console and Bing Webmaster Tools both have sitemap submission. Bing represents 5–15% of search traffic depending on the market and audience.


How to use the XML Sitemap Generator on sadiqbd.com

  1. Enter your URLs β€” one per line, or import from a list
  2. Set <lastmod> dates β€” optionally specify when each URL was last updated
  3. Generate β€” produces correctly formatted XML sitemap
  4. Validate β€” check the output for any syntax issues before uploading
  5. Upload to your domain root as /sitemap.xml
  6. Submit in Google Search Console β†’ Sitemaps

After submitting, monitor the indexation ratio in Search Console and investigate exclusion reasons for URLs that remain unindexed.


Frequently Asked Questions

What's a healthy indexation ratio? For a well-maintained content site, 70–90% is typical. Very high-quality sites with strong link profiles and thin content removed may approach 95%+. A ratio below 50% for a content site with substantive pages warrants investigation.

How long after submitting a sitemap will Google index the URLs? Freshly submitted sitemaps trigger recrawl within days to weeks for most sites. The time to index (not just crawl) varies β€” popular pages with backlinks are indexed faster. Entirely new sites may take weeks for initial indexation.

Does having a sitemap improve rankings? Not directly. Sitemaps help Google discover pages β€” whether those pages rank depends on quality, relevance, and links, not the sitemap itself. Sitemaps are about discoverability, not rankings.

Is the XML Sitemap Generator free? Yes β€” completely free, no sign-up required.


The sitemap indexation ratio is one of the clearest signals of overall site health in Search Console. Monitoring it regularly and investigating exclusion reasons proactively prevents the quiet accumulation of quality issues that eventually affect rankings.

Try the XML Sitemap Generator free at sadiqbd.com β€” generate a correctly formatted XML sitemap for any set of URLs instantly.

Try the related tool:
Open tool

More XML Sitemap Generator articles