Why is a sitemap important?

Search engines use web crawlers, also called bots or spiders, to find and index web pages. These crawlers follow links from one page to another to discover new content. Without a sitemap, the crawler has to rely only on finding links to new or updated pages through existing links on the site.

This can be problematic for several reasons:

Orphaned Pages: Pages that are not linked from other pages on the site (orphaned pages) may never be found by crawlers if there are no direct links to them.
Deep Hierarchies: In websites with deep hierarchies, important pages might be buried several levels deep, making it less likely for crawlers to reach them quickly.
Complex Navigation: Websites with complex navigation structures can make it hard for crawlers to find all pages efficiently, especially if the internal linking is not well-optimized.

A sitemap gives search engines a direct list of URLs, making sure all pages, even those hard to find through normal crawling, are discovered and indexed faster.

An e-commerce website with a deep hierarchy:

Homepage: The starting point.
Main Categories: Electronics, Clothing, Home & Garden.
Subcategories: Under Electronics, you have Phones, Laptops, Accessories.
Product Pages: Each subcategory contains multiple product pages.

In this example, a product page for a specific phone model might be several levels deep (Homepage > Electronics > Phones > Specific Phone Model). Without a sitemap, it might take longer for search engines to find and index these deep pages. A sitemap makes sure that even the deepest pages are listed and can be found by search engines quickly.

Sitemap in next.js

Next.js has built-in support for generating sitemaps using the sitemap.(js|ts) file convention. Create a file in the app folder and exporting a default function that returns an array of URLs.

An example:

import { MetadataRoute } from "next";

export default function sitemap(): MetadataRoute.Sitemap {
  return [
    {
      url: "https://acme.com",
      lastModified: new Date(),
      changeFrequency: "yearly",
      priority: 1,
    },
    {
      url: "https://acme.com/about",
      lastModified: new Date(),
      changeFrequency: "monthly",
      priority: 0.8,
    },
    {
      url: "https://acme.com/blog",
      lastModified: new Date(),
      changeFrequency: "weekly",
      priority: 0.5,
    },
  ];
}

There are two main properties to be aware of: priority and changeFrequency.

Priority

The priority attribute in a sitemap tells how important a page is compared to other pages on the same site. It ranges from 0.0 to 1.0, with 1.0 being the most important. This helps search engines know which pages the website owner thinks are the most important.

High Priority (0.8 - 1.0): Homepage, major category pages, key landing pages.
Medium Priority (0.4 - 0.7): Regular blog posts, secondary category pages.
Low Priority (0.0 - 0.3): Old news articles, less important utility pages.

Change Frequency

The changeFrequency attribute indicates how often the content of a page is likely to change. Possible values: "always," "hourly," "daily," "weekly," "monthly," "yearly," and "never."

Always: Pages that change constantly, like stock market data.
Hourly: Frequently updated news sites.
Daily: Blogs with daily posts.
Weekly: Product pages with weekly updates.
Monthly: FAQ pages updated monthly.
Yearly: Contact pages or about pages.
Never: Archived content that never changes.

What happens with the sitemap file?

Sitemap is usually an XML file, looking something like:

<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <url>
    <loc>https://example.com/</loc>
    <lastmod>2023-06-21</lastmod>
    <changefreq>daily</changefreq>
    <priority>1.0</priority>
  </url>
  <url>
    <loc>https://example.com/about</loc>
    <lastmod>2023-06-20</lastmod>
    <changefreq>monthly</changefreq>
    <priority>0.8</priority>
  </url>
</urlset>

This site will be deployed to https://example.com/sitemap.xml.

We don't want the web crawler to index all pages

Some of our pages are private e.g. behind authentication. We don't want the web crawler to index those pages. They wouldn't be discoverable by search engines in the first place.

You can communicate this to the web crawler by adding a robots.txt file to the root of your website.

In Next.js, you can create a robots.txt file by adding a robots.ts file to the app/ folder.

import { MetadataRoute } from "next";

export default function robots(): MetadataRoute.Robots {
  return {
    rules: [
      {
        userAgent: "*",
        allow: "/",
        disallow: "/private/",
      },
    ],
    sitemap: "https://example.com/sitemap.xml",
  };
}

SEO and Sitemap in Next.js

Why is a sitemap important?

Example of complex navigation

Sitemap in next.js

Priority

Change Frequency

What happens with the sitemap file?

We don't want the web crawler to index all pages

Comments

More from this blog

Bipedal, humanoid, and the words for creature shapes in games

Lerp and smoothstep, what they actually do

IK and FK, what they actually are

How to make your textures fast on the GPU

Convex hulls and why they matter for collision

Command Palette

Why is a sitemap important?

Example of complex navigation

Sitemap in next.js

Priority

Change Frequency

What happens with the sitemap file?

We don't want the web crawler to index all pages

Comments

More from this blog