Forum Topic: Web crawlers

Forum: .htaccess Forum : Security • Posted by Mitch • Updated: Monday, October 19th, 2015 @ 2:57 pm

I am wondering how people handle all the web crawlers out there. A year and a half ago we blocked most to the web crawlers to reduce traffic on our shared host. I both listed the user-agents in robots.txt and denied by IP in the .htaccess file. After combing the visitor log for a few months and building this list, I reduced server load and haven’t really thought about it too much. I am curious to know other view of these crawlers?

The following user agents are blocked – User-agent:

AhrefsBot
Baiduspider
Ezooms
MJ12bot
Sosospider
Yandex
360spider
sogou web spider
SemrushBot
JikeSpider
adbeat_bot
careerbot
sistrix
AcoonBot
Abonti
UnwindFetchor
SiteExplorer
SeznamBot
EasouSpider

2 Replies to “Web crawlers”

Posted by Jeff Starr • Wednesday, August 6th, 2014

In general there are two approaches to blocking/controlling access with .htaccess: blacklist or whitelist. I have many extensive blacklists at Perishable Press:

https://perishablepress.com/search/user+agent+blacklist/

..and a whitelist is available here:

https://perishablepress.com/invite-only-visitor-exclusivity-via-the-opt-in-method/

Note that the lists may need updating, especially the whitelist.

Posted by Mitch • Friday, August 8th, 2014

Jeff,
Thank you for your reply. Your lists, web sites and books have been invaluable in helping me as I learn about managing our WordPress site.

I think I know better now what I really wanted to ask. Is there a place where people discuss what bots/crawlers/user agents/spiders they consider good/okay and what bots they consider bad? For those in the U.S. we could probably get agreement that there are some indexers generally thought of as okay (e.g. Google, Microsoft, Yahoo). Is there a place where people discuss Yandex, Baidu, Soso, etc.? Where can I fin

Your white- and black-lists express are a great start and reveal your opinion!

Mitch

← redirecting wwww to www

QUERY_STRING values not displaying →

.htaccess made easy

improve security & performance

Forum Topic: Web crawlers

2 Replies to “Web crawlers”