Why is a sitemap important?
Search engines use web crawlers, also called bots or spiders, to find and index web pages. These crawlers follow links from one page to another to discover new content. Without a sitemap, the crawler has to rely only on finding links to new or updated pages through existing links on the site.
This can be problematic for several reasons:
Orphaned Pages: Pages that are not linked from other pages on the site (orphaned pages) may never be found by crawlers if there are no direct links to them.
Deep Hierarchies: In websites with deep hierarchies, important pages might be buried several levels deep, making it less likely for crawlers to reach them quickly.
Complex Navigation: Websites with complex navigation structures can make it hard for crawlers to find all pages efficiently, especially if the internal linking is not well-optimized.
A sitemap gives search engines a direct list of URLs, making sure all pages, even those hard to find through normal crawling, are discovered and indexed faster.
Example of complex navigation
An e-commerce website with a deep hierarchy:
Homepage: The starting point.
Main Categories: Electronics, Clothing, Home & Garden.
Subcategories: Under Electronics, you have Phones, Laptops, Accessories.
Product Pages: Each subcategory contains multiple product pages.
In this example, a product page for a specific phone model might be several levels deep (Homepage > Electronics > Phones > Specific Phone Model). Without a sitemap, it might take longer for search engines to find and index these deep pages. A sitemap makes sure that even the deepest pages are listed and can be found by search engines quickly.
Sitemap in next.js
Next.js has built-in support for generating sitemaps using the sitemap.(js|ts)
file convention. Create a file in the app folder and exporting a default function that returns an array of URLs.
An example:
import { MetadataRoute } from "next";
export default function sitemap(): MetadataRoute.Sitemap {
return [
{
url: "https://acme.com",
lastModified: new Date(),
changeFrequency: "yearly",
priority: 1,
},
{
url: "https://acme.com/about",
lastModified: new Date(),
changeFrequency: "monthly",
priority: 0.8,
},
{
url: "https://acme.com/blog",
lastModified: new Date(),
changeFrequency: "weekly",
priority: 0.5,
},
];
}
There are two main properties to be aware of: priority
and changeFrequency
.
Priority
The priority attribute in a sitemap tells how important a page is compared to other pages on the same site. It ranges from 0.0 to 1.0, with 1.0 being the most important. This helps search engines know which pages the website owner thinks are the most important.
High Priority (0.8 - 1.0): Homepage, major category pages, key landing pages.
Medium Priority (0.4 - 0.7): Regular blog posts, secondary category pages.
Low Priority (0.0 - 0.3): Old news articles, less important utility pages.
Change Frequency
The changeFrequency
attribute indicates how often the content of a page is likely to change. Possible values: "always," "hourly," "daily," "weekly," "monthly," "yearly," and "never."
Always: Pages that change constantly, like stock market data.
Hourly: Frequently updated news sites.
Daily: Blogs with daily posts.
Weekly: Product pages with weekly updates.
Monthly: FAQ pages updated monthly.
Yearly: Contact pages or about pages.
Never: Archived content that never changes.
What happens with the sitemap file?
Sitemap is usually an XML file, looking something like:
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>https://example.com/</loc>
<lastmod>2023-06-21</lastmod>
<changefreq>daily</changefreq>
<priority>1.0</priority>
</url>
<url>
<loc>https://example.com/about</loc>
<lastmod>2023-06-20</lastmod>
<changefreq>monthly</changefreq>
<priority>0.8</priority>
</url>
</urlset>
This site will be deployed to https://example.com/sitemap.xml
.
We don't want the web crawler to index all pages
Some of our pages are private e.g. behind authentication. We don't want the web crawler to index those pages. They wouldn't be discoverable by search engines in the first place.
You can communicate this to the web crawler by adding a robots.txt
file to the root of your website.
In Next.js, you can create a robots.txt
file by adding a robots.ts
file to the app/
folder.
import { MetadataRoute } from "next";
export default function robots(): MetadataRoute.Robots {
return {
rules: [
{
userAgent: "*",
allow: "/",
disallow: "/private/",
},
],
sitemap: "https://example.com/sitemap.xml",
};
}