WordPress Robots.txt Sample

WordPress Robots.txt

User-agent: *
Disallow: /wp-admin/
Disallow: /wp-includes/

Adding Sitemaps to WordPress Robots.txt

User-agent: *
Disallow: /wp-admin/
Disallow: /wp-includes/

Sitemap: http://www.example.com/post-sitemap.xml

Explanation

Allowing all Bots

Allowing any Bots to Crawl

User-agent: *
Disallow:

Not Allowing any Bots

Not Allowing any Bots to Crawl

User-agent: *
Disallow: /

Block a Folder

User-agent: *
Disallow: /Folder/

Block a File

User-agent: *
Disallow: /file.html

Block a page and/or a directory named private

User-agent: *
Disallow: /private

Block All Sub Folders starting with private

User-agent: *
Disallow: /private*/

Block URL’s end with

User-agent: *
Disallow: /*.asp$

Block URL’s which includes Question Mark (?)

User-agent: *
Disallow: /*?*

Block a File Type

User-agent: *
Disallow: /*.jpeg$

Block all Paginated pages which don’t have “?” at the end

http://www.example.com/blog/? ( Allow )
http://www.example.com/blog/?page=2 ( Not Allow )

Helps us to Block Paginated pages from Crawling

User-agent: *
Disallow: /*? # block URL that includes ?
Allow: /*?$ # allow URL that ends in ?

Using Hash

# Hash is used for commenting out

Bots / User Agents

Top 10 Bots

Robot

bingbot

Googlebot

Googlebot Mobile

AhrefsBot

Baidu

MJ12bot

proximic

ADmantX

msnbot/2.0b

Individual Crawl request for each Bots

User-Agent: Googlebot
Allow: /

User-Agent: Googlebot-Mobile
Allow: /

User-Agent: msnbot
Allow: /

User-Agent: bingbot
Allow: /

# Adsense
User-Agent: Mediapartners-Google
Disallow: / 

# Blekko
User-Agent: ScoutJet
Allow: / 

User-Agent: Yandex
Allow: / 

# CommonCrawl
User-agent: ccbot
Allow: / 

User-agent: baiduspider
Allow: / 

User-agent: DuckDuckBot
Allow: / 

User-Agent: *
Disallow: /