SEO Beginners Course 4.4: Sitemaps and Robots.txt

Learn to set up XML sitemaps and robots.txt files for WordPress to boost search engine crawling and indexing efficiency.

XML Sitemaps and Robots.txt for WordPress Sites

Learning Objectives

By the end of this chapter, you'll be able to:

  • Set up XML sitemaps on WordPress using SEO plugins
  • Create and configure robots.txt files to control search engine crawling
  • Submit sitemaps to Google Search Console and monitor their performance
  • Avoid common mistakes that can harm your site's search visibility

Introduction

Two essential files help search engines understand and crawl your WordPress site effectively: XML sitemaps and robots.txt files.

An XML sitemap acts like a roadmap for search engines, listing all your important pages so they can find and index your content. The robots.txt file works as a gatekeeper, telling search engine crawlers which areas of your site they can access and which they should avoid.

Getting these files right means search engines can crawl your site efficiently, focusing on your best content while avoiding pages that don't need indexing. This chapter shows you how to set up both files correctly on WordPress.

Lessons

Setting Up XML Sitemaps in WordPress

WordPress doesn't create XML sitemaps by default, but SEO plugins make this simple.

Installing and configuring Yoast SEO:

  1. Go to Plugins > Add New in your WordPress dashboard
  2. Search for "Yoast SEO" and install the plugin
  3. Navigate to SEO > General in your WordPress menu
  4. Click the "Features" tab and ensure XML sitemaps are enabled
  5. Click the question mark icon next to "XML sitemaps" to view your sitemap URL

Your sitemap URL will typically be: yoursite.com/sitemap_index.xml

What gets included:

  • Published posts and pages
  • Category and tag pages
  • Author pages (if you have multiple authors)
  • Custom post types (if configured)

Customising your sitemap:

  1. Go to SEO > Search Appearance
  2. Use the Content Types and Taxonomies tabs to exclude specific post types or categories
  3. Set how many URLs appear per sitemap page (default is 1,000)

Creating and Managing Robots.txt Files

Your robots.txt file sits in your site's root directory and gives instructions to search engine crawlers.

Method 1: Using an SEO plugin (recommended for beginners)

With Yoast SEO installed:

  1. Go to SEO > Tools
  2. Click "File editor"
  3. Select the "Robots.txt" tab
  4. Add your directives using this format:
User-agent: *
Disallow: /wp-admin/
Disallow: /wp-includes/
Allow: /wp-admin/admin-ajax.php

Sitemap: https://yoursite.com/sitemap_index.xml

Method 2: Creating the file manually

If you're comfortable with FTP:

  1. Connect to your site via FTP or use your hosting control panel's file manager
  2. Navigate to your site's root directory (usually public_html)
  3. Create a new file called "robots.txt"
  4. Add your directives and save

Essential robots.txt rules for WordPress:

  • Always disallow /wp-admin/ (except admin-ajax.php for functionality)
  • Consider disallowing /wp-includes/ to avoid crawling system files
  • Include your sitemap URL at the bottom
  • Be careful not to block important content

Submitting Your Sitemap to Search Engines

Getting your sitemap indexed speeds up the discovery process.

Submitting to Google Search Console:

  1. Log into Google Search Console
  2. Select your property (website)
  3. Go to Sitemaps in the left menu
  4. Enter your sitemap URL (just the part after your domain: sitemap_index.xml)
  5. Click Submit

Monitoring sitemap performance:

  • Check the "Submitted" vs "Indexed" numbers regularly
  • Investigate any errors or warnings that appear
  • Resubmit your sitemap after major site changes

Practice

Create a basic robots.txt file for a WordPress site that:

  • Blocks crawlers from the admin area
  • Allows access to admin-ajax.php
  • Blocks access to a hypothetical "/private/" directory
  • Includes a sitemap reference

Your file should look like this:

User-agent: *
Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.php
Disallow: /private/

Sitemap: https://yoursite.com/sitemap_index.xml

FAQs

How often do XML sitemaps update automatically?
Most SEO plugins update sitemaps automatically when you publish new content. You can check your sitemap URL to see the latest version, which includes timestamps for each page.

Can robots.txt completely hide pages from search engines?
No. Robots.txt is a public file that search engines respect voluntarily, but it's not a security measure. Pages blocked in robots.txt might still appear in search results if other sites link to them.

What happens if I don't have a robots.txt file?
Search engines will crawl your entire site without restrictions. While not harmful, this means they might waste time on unimportant pages instead of focusing on your best content.

Should I include images in my sitemap?
Yoast SEO includes images automatically if they're attached to posts or pages. This helps Google discover and index your images for image search results.

How do I test my robots.txt file?
Use Google Search Console's robots.txt Tester tool (under Legacy Tools) to check if your file blocks important pages accidentally.

Jargon Buster

XML Sitemap: A file listing your website's important pages in XML format, designed specifically for search engines to read and understand your site structure.

Robots.txt: A plain text file containing instructions for search engine crawlers about which parts of your site they should or shouldn't access.

User-agent: The first part of robots.txt rules that specifies which crawlers the rules apply to. Using "*" means the rules apply to all search engines.

Disallow: A robots.txt directive that tells crawlers not to access specific pages or directories.

Crawl Budget: The number of pages search engines will crawl on your site during a given time period. Proper robots.txt usage helps focus this budget on important pages.

Wrap-up

You now know how to set up XML sitemaps and robots.txt files to guide search engines around your WordPress site effectively. These tools work together to help search engines find your important content while avoiding areas that don't need indexing.

Start by installing Yoast SEO and enabling XML sitemaps, then create a basic robots.txt file to block standard WordPress directories. Submit your sitemap to Google Search Console and monitor the results to see how search engines respond to your site.

Remember to review these files whenever you make major changes to your site structure or add new content types that might need different handling.

Ready to take your WordPress SEO further? Join Pixelhaze Academy for more advanced techniques and hands-on support.