Forum Topic: Monitoring 6G

Forum: .htaccess Forum : General • Posted by RememberToForget • Updated:

So I’ve had 6G up for a few days now. The way I ended up doing this was by first excluding certain directories such as my newsletter & cart programs, because they use ultra-long query strings and a few other things that 6G wouldn’t allow, and, I just wasn’t able to track these all down. I also installed the blackhole, so now everything is set up such that I get an email whenever someone gets 403’ed, blackholed, etc., and then I’m scanning through logs and trying to figure out why exactly they ended up getting in trouble.

Alright, so the first big question that comes up is, this one person or bot has a user-agent that is encased in ‘this’ (little quotes or whatever.)

"'Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0)'"

Why do they wrap it in 'this' (and why did you decide to exclude 'this', <that>, and href\s?

Also what strikes me as off putting is I’ve noticed that the classier bots tend to include a URL that points to a page that explains what their bot does, how you can influence its behavior on your site, and so on.

The next major thing I’ve noticed is that many of the questionable bots are using HEAD requests. I’m tempted to ban HEAD requests.

I think that’s all for 6G. I thought it would take me *forever* to decode its meaning but I’m surprised to see I understood most of it fairly quickly. You print out the character def’s and stick it on your wall and next thing you know, everything starts to make perfect sense in no time.

Actually there’s two more questions:

The third redirectmatch line seems to want to redirect the letter ‘s’?

RedirectMatch 403 (?i)(<|>|:|;|\'|\s)

And the fifth redirectmatch line doesn’t like &?

RedirectMatch 403 (?i)(\"|\.|\_|\&|\&amp;)$

(My cart’s ‘checkout with paypal’ button uses &amp;)

3 Replies to “Monitoring 6G”

Posted by Jeff Starr

“Why do they wrap it in ?this? (and why did you decide to exclude 'this', <that>, and href\s?”

I could only guess at “why” skiddies do some of the things they do.. perhaps it’s their way of telling their victims that they are spoofing the UA, like when people do that two-finger quote thing with their hands.. When included with code, quotes play an important role in syntax, etc., so it could be related to that or perhaps a relic of frantic copy/pasting.. only wild stabs here. And for the this that and hrefs, you’ll have to let me know which patterns/lines you’re referring to — I’m working remotely and unable to reference any of my codes at the moment.

Blocking HEAD requests may seem counter-intuitive, but it’s totally fine in most cases — only bots and scripts should be effected.

“The third redirectmatch line seems to want to redirect the letter ‘s’?”

Nope, that’s an escaped blank space :)

“And the fifth redirectmatch line doesn’t like &?”

Correct, but only when it is appended as the last character on a requested URL. It won’t be blocked anywhere else (note the $, which denotes the end of a line).

Posted by RememberToForget •

OK, I’m referring to these lines:

RedirectMatch 403 (?i)(<|>|:|;|\'|\s)
.
.
.
SetEnvIfNoCase User-Agent (<|>|'|&lt;|%0A|%0D|%27|%3C|%3E|%00|href\s) keep_out

One thing I’m noticing is that googlebot is getting nailed by a few 403’s, I’ll try to figure it out tonight.

So href\s would be equal to href and one blank space?

If you have another ebook coming out about this stuff put me on the waiting list. :)

Posted by Jeff Starr

Correct, href\s matches the string, “href” followed by a single blank space.

For the other character patterns (angle brackets, single quotes, etc.), those are matching the unencoded characters only. As explained here (and elsewhere), certain characters must always be encoded in URLs. So theoretically it’s fine to block such characters, but in reality, even some legit URLs are not encoded properly, resulting in false positives. More info:

https://perishablepress.com/stop-using-unsafe-characters-in-urls/