Adding Sitemap to a blog website based on the Astro framework

Install @astro/sitemap#

First, install the official sitemap package under Astro:

# Using NPM
npx astro add sitemap
# Using Yarn
yarn astro add sitemap
# Using PNPM
pnpm astro add sitemap

Press y and enter, Astro will automatically modify your configuration file astro.config.mjs to add sitemap packaging functionality.

It is recommended to double-check your configuration file. If the following code is present in the file, it means the configuration was successful. If not, please add it yourself:

import { defineConfig } from 'astro/config';
import sitemap from '@astrojs/sitemap';

export default defineConfig({
  // ...
  integrations: [sitemap()],
})

After the configuration is completed, package it. A file named sitemap-0.xml will be generated in the root directory. This is the sitemap file for your website. Submit it to search engines for indexing.

Add Baidu Verification#

Note: Website verification is required for sitemap indexing. Add your website to the Baidu Search Resource Platform and follow the verification process described in this detailed guide on site verification.

Select the appropriate site properties.

Choose file verification, download the verification file, and place it in the root directory of your website.

After packaging and deploying, click on the verification file to ensure it can be accessed normally.

Add Baidu Indexing#

In the Baidu Resource Search Platform - Normal Indexing, enter the sitemap address of your website.

Add Google Verification#

Refer to the documentation on Google Search Central for more information.

To submit a sitemap, send a GET request to the following address in your browser or command line, specifying the complete URL of your sitemap. Make sure the sitemap file is accessible:

https://www.google.com/ping?sitemap=FULL_URL_OF_SITEMAP  # FULL_URL_OF_SITEMAP: Location of the sitemap file

For example, in my browser, I directly enter the following link:

https://www.google.com/ping?sitemap=https://cirry.cn/sitemap-0.xml

The returned page is as follows:

Click on the link http://www.google.com/webmasters/tools/, which will redirect you to Google Search Console.

Enter your website and verify it.

After successful verification, a prompt will appear asking you to add a verification method.

You can choose the method shown in the image below: download the HTML file and add it to the root directory of your website:

Alternatively, you can choose to add the following request header to the head of your website:

After adding it, re-verify in Google Search Console. If you see the image below, it means the addition is complete.

Add Google Indexing#

In Google Search Console, enter the sitemap address of your website.

Issues Encountered#

After completing the normal operations, check if the Robots and Crawl Diagnostics of your website can be used normally in the Baidu Search Resource Platform.

I added the sitemap of my website in Normal Indexing on Baidu, but found that it failed to be indexed.

In the Crawl Diagnostics, I encountered a diagnostic error indicating robots.txt blocking. So I tested my website on xml-sitemaps to see if it could be scanned.

I found that this website couldn't detect my website either, which means it was restricted by the crawler protocol. So I made some changes to my crawler protocol. The modified robots.txt is as follows:

User-agent: *
Allow: /
Sitemap: https://cirry.cn/sitemap-0.xml

After the modification, remember to click on the error in the diagnostic report and report the error message to Baidu.

After a few minutes, I crawled my website on xml-sitemaps and it succeeded. I resubmitted it on Baidu and it was indexed normally.

Note: Baidu does not allow indexing-type sitemaps. Therefore, we should not include sitemap-index.xml, which is generated by @astro/sitemap, in the robots.txt file. Otherwise, Baidu will still prompt for Robots blocking, resulting in failure to crawl information.

If you encounter other issues, you can refer to the analysis of common errors in the Crawl Diagnostics tool.

You can use the following command to check if it is blocked by robots or if there is an IP resolution error. If it returns HTTP 200, it means it is normal. Otherwise, it is abnormal. Remember to replace the website link at the end with your own website.

curl --head  --user-agent 'Mozilla/5.0 (compatible; Baiduspider/2.0; +<http://www.baidu.com/search/spider.html)>' --request GET 'https://cirry.cn'