如何使用 robots.txt 檔案來掌控搜尋引擎對我網站的存取？

我的網站平日上線人數不多，為何流量使用還是很大？

用戶的網站可能每日上站瀏覽人數不多，卻發現流量使用量增加的很快？

請進入控制台點選 Awstat，進入後點選左列之 "搜尋引擎網站的漫遊器"，看看是不是搜尋引擎吃掉了您的流量。

若看到 Unknown robot (identified by 'spider')，這個也是搜尋引擎，只是沒這麼有名，Awstat 統計軟體分不出來。

要限制搜尋引擎抓取您網站上的資料有兩種方法，一是使用 robots.txt，另一個方式是使用 .htaccess

要注意的是 robots.txt 不是放了馬上就有效，一般可能須等到數星期，若有急迫的需求請使用 .htaccess 的方式。

Google 使用 robots.txt 檔案可參考：
http://www.google.com.tw/support/webmasters/bin/answer.py?answer=40360

Yahoo 搜尋引擎的說明：
http://help.yahoo.com/help/tw/ysearch/ysearch-27.html

大陸搜狗：
http://www.sogou.com/docs/help/webmasters.htm#07

大陸百度(本公司主機預設不允許百度spider，若要允許請使用 .htaccess 關閉 mod_security 保護)：
http://www.baidu.com/search/guide.html#4

相關網站：
http://www.robotstxt.org/

如果在您的網站根目錄放置一 robots.txt 內容如下：

User-agent: *
Disallow: /

則會禁止所有搜尋引擎蒐集網站內容，若要禁止大陸百度與搜狗，.htaccess 內容如下：

SetEnvIfNoCase User-Agent "^Baidu" bad_bot
SetEnvIfNoCase User-Agent "^sogou" bad_bot
SetEnvIfNoCase User-Agent "^Bloghoo" bad_bot
SetEnvIfNoCase User-Agent "^Scooter" bad_bot
Deny from env=bad_bot

虛擬主機問題

如何使用 robots.txt 檔案來掌控搜尋引擎對我網站的存取？