This is a story of some of the dark corners of the internet, with a puzzle at the end and a request for advice…
Our story starts a few weeks ago. I had installed Statcounter on the blog postings to keep an eye on who visits my blog and why, with more information than you get from perusing access logs (I have those too). I also like following links back to referrers to see why they’re linking to my site, when I have time. A few weeks ago I noticed what looked like a spam site linking to my blog — you know the type of URL, it’s some nonsensical combination of letters and digits. So I followed it back, only to find that it was a complete frame of my blog. View source showed only that my site was being framed. No other content was being added as ads, as meta content, or anything else that I could see. Nothing that explained why they’re doing this.
So I looked up the whois for the site, discovered it’s hidden by a company called “Domains by Proxy”, which specializes in hiding registration data for web sites. They have lots of information on their site about how they cooperate with law enforcement if people are doing something illegal, which leads me to suspect that unless you can prove someone’s doing something illegal they won’t do anything or even talk to you. Not that I tried talking to them, since simply framing my site isn’t illegal, or even contravening my Creative Commons license. It is, however, highly suspicious.
A little more investigation was in order; the number of hits on my website from this site were increasing and other versions of the URL were showing up. The URL was of the form “aff” followed by “0000” followed by a number, followed by .com (yes, it’s circuitous, but I don’t want my site linked to theirs in search engines, for reasons that will become obvious). I checked out and found that all numbers from 1 to 28 pointed to my site. So someone paid to register 28 domains, host 28 domains, and put in HTML to point to my site? None of the URLs showed up in the common search engines, but somehow they were being clicked on, seemingly by real people (spread of ISPs across the world, different OSes, screen resolutions, and browsers, all staying for approximately zero seconds).
I contemplated putting in some frame busting code but decided to wait a little and see what happened, in case they were just getting ready to do something. In the meantime more of these sites start pointing at mine. And finally one of them showed up in a search engine, and there it points to an adult site. One of those ones that may not be safe at work, at least judging by the front page. In which case the frame busting isn’t the answer anyway, the people visiting this site don’t want to see my musings on technology, motherhood, or knitting, they want the adult content they expect.
Tim had the bright idea at this stage of using a command-line fetch on the “aff” sites and found that the index page returns a list of potential misspellings of the adult site’s name. About 10000 of them. The other sites return similar lists; number 28 only returns about 7000 misspellings. If you search for one of these misspellings in a common search engine, you land on an “aff” page, which then redirects you to the adult site. But only if you come from a search engine. If you type in that site name in the address bar, the redirect sends you to my blog.
So I have a couple of questions, and would appreciate any thoughts or experiences you have.
- Why are they not redirecting to the adult site, which is probably what the people who are clicking on an “aff” site probably want? Why send them to another site?
- Related question: why me? Why someone who writes about technology, and not someone on some free hosting site who may not even notice the increase in traffic, let alone get suspicious about it?
- What do I do about it? I could block people from “aff” site from linking to my site; receiving a “You’re in timeout.” message (error 403 as seen by Mark Pilgrim) might have some effect. One related question to this is why people are going to an “aff” site anyway; since the “aff” sites redirect people coming from search engines to the actual adult site itself one could suppose nobody would ever click on it. Tim suggested people might be curious; they see the URL in the search engine listings and type it in the address bar to see what’s there.
The adult site itself does have a technical contact in the whois registry but the purveyors of the “aff” sites might not be them. Suggestions welcome… the hits I’m getting have grown from nothing a few weeks ago to now being a substantial part of the direct hits on my site so it’s a problem I want to solve soon.