At April 22 2009 I started this WP blog. Before I used to blog at web-log and in the mean time I also wrote down my recipes at blogspot. The recipe blog still exist but I don’t have the password anymore… And of course there was also my “a bug a day” picture blog for my 365days project in 2012 (which “failed” after 184 days and still has to be merged with blog.spiderwebz).
After starting my blog in English, I switched back to Dutch by 2011 and then back to English again after buying my first typewriter in November 2012. I have written 220 blog posts which received 442 comments (thank you typosphere!). This blog only had two different themes, but they both had the same color scheme. Guess I like green. ;-)
– – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – –
Now after watching a few stats on other blogs recently, I started to wonder. How many of these visits are indeed persons instead of spammers and crawlers? Since I cannot use their blogs and stats, I used my own.
The first thing I needed for my little research, was to KEEP ALL THE SPAM COMMENTS!
On Friday the 18th, I caught 239 of them. According to StatPress, I had 21 visitors that day, with 35 pageviews. Onestat kinda agrees, with 17 unique visitors. Saturday the 18th was very similar; 247 spam comments, 20 visitors according to StatPress, 11 unique ones according to Onestat. Sunday the 19th was the same thing all over again; 271 spam comments, 21 visitors according to StatPress, 10 unique ones according to Onestat.
When I look at the visitors list in Onestat, I mostly see people entering my blog through the typosphere blog roll, other typosphere blogs and search engines. Plausible! But, when I look at StatPress and want to see the same visitors list, I see this:
Not once, but from numerous IP addresses and in different styles and shapes. That’s not a very human thing to do…. When I follow a couple of these form algorithm’s I enter my blog on the many pages that receive a lot of spam comments. Coincidence? I don’t think so! Looking at another graphic about unique visitors en pageviews, StatPress shows me the following numbers:
That’s a whole different number, about the same days! So where do they come from? When I look at my “Top IP – Page views” stats, four immediately jump out. After requesting the report on the first IP, it turns out this specific address is indeed a web crawler and identifies itself as jobdigger spider. It enters my blog searching for robots.txt and crawls all my Dutch posts. A very nice thing to do of this company, is explain on their own website what they do, why they are crawling and how to get rid of it. I’m a little impressed! But the next one doesn’t of course. And neither do all the others.
What’s so difficult about getting rid of the spambots is that they use several different IP’s every day. So blocking the IP isn’t sufficient. Some of you are using Captcha, but I rather delete every spam comment myself than to scare off a visitor. I like to keep things open and easy for visitors that aren’t spambots and crawlers. I do run Akismet. It catches the spam comments and puts them in my spam folder. It’s hardly ever wrong!
So what about that robots.txt file the jobdigging website was talking about? As a website owner, you can use the /robots.txt file to give instructions about your website to web robots; it’s called The Robots Exclusion Protocol. When a robot wants to visit a website URL it seeks for URL/robots.txt.
Sounds lovely right? But, there are two important considerations when using /robots.txt. First it’s a publicly available file which anyone can read. And more importantly in this case, robots can ignore your /robots.txt. Especially malware robots that scan the web for security vulnerabilities, and email address harvesters used by spammers will pay no attention. So this either won’t help to fight spambots. Guess I’m going back to ban the recurring IP addresses!
The point I was trying to make writing this blog post? Well, yes, we get a lot of traffic. But they are not all human. The majority exists out of spammers and crawlers. And you cannot get rid of them…