Dec 162006
 

This is a story of some of the dark corners of the inter­net, with a puzzle at the end and a request for advice…

Our story starts a few weeks ago. I had installed Stat­counter on the blog post­ings to keep an eye on who vis­its my blog and why, with more inform­a­tion than you get from per­us­ing access logs (I have those too). I also like fol­low­ing links back to refer­rers to see why they’re link­ing to my site, when I have time. A few weeks ago I noticed what looked like a spam site link­ing to my blog — you know the type of URL, it’s some non­sensic­al com­bin­a­tion of let­ters and digits. So I fol­lowed it back, only to find that it was a com­plete frame of my blog. View source showed only that my site was being framed. No oth­er con­tent was being added as ads, as meta con­tent, or any­thing else that I could see. Noth­ing that explained why they’re doing this.

So I looked up the whois for the site, dis­covered it’s hid­den by a com­pany called “Domains by Proxy”, which spe­cial­izes in hid­ing regis­tra­tion data for web sites. They have lots of inform­a­tion on their site about how they cooper­ate with law enforce­ment if people are doing some­thing illeg­al, which leads me to sus­pect that unless you can prove someone’s doing some­thing illeg­al they won’t do any­thing or even talk to you. Not that I tried talk­ing to them, since simply fram­ing my site isn’t illeg­al, or even con­tra­ven­ing my Cre­at­ive Com­mons license. It is, how­ever, highly suspicious.

A little more invest­ig­a­tion was in order; the num­ber of hits on my web­site from this site were increas­ing and oth­er ver­sions of the URL were show­ing up. The URL was of the form “aff” fol­lowed by “0000” fol­lowed by a num­ber, fol­lowed by .com (yes, it’s cir­cuit­ous, but I don’t want my site linked to theirs in search engines, for reas­ons that will become obvi­ous). I checked out and found that all num­bers from 1 to 28 poin­ted to my site. So someone paid to register 28 domains, host 28 domains, and put in HTML to point to my site? None of the URLs showed up in the com­mon search engines, but some­how they were being clicked on, seem­ingly by real people (spread of ISPs across the world, dif­fer­ent OSes, screen res­ol­u­tions, and browsers, all stay­ing for approx­im­ately zero seconds).

I con­tem­plated put­ting in some frame bust­ing code but decided to wait a little and see what happened, in case they were just get­ting ready to do some­thing. In the mean­time more of these sites start point­ing at mine. And finally one of them showed up in a search engine, and there it points to an adult site. One of those ones that may not be safe at work, at least judging by the front page. In which case the frame bust­ing isn’t the answer any­way, the people vis­it­ing this site don’t want to see my mus­ings on tech­no­logy, moth­er­hood, or knit­ting, they want the adult con­tent they expect.

Tim had the bright idea at this stage of using a com­mand-line fetch on the “aff” sites and found that the index page returns a list of poten­tial mis­spellings of the adult site’s name. About 10000 of them. The oth­er sites return sim­il­ar lists; num­ber 28 only returns about 7000 mis­spellings. If you search for one of these mis­spellings in a com­mon search engine, you land on an “aff” page, which then redir­ects you to the adult site. But only if you come from a search engine. If you type in that site name in the address bar, the redir­ect sends you to my blog.

So I have a couple of ques­tions, and would appre­ci­ate any thoughts or exper­i­ences you have.

  1. Why are they not redir­ect­ing to the adult site, which is prob­ably what the people who are click­ing on an “aff” site prob­ably want? Why send them to anoth­er site? 
  2. Related ques­tion: why me? Why someone who writes about tech­no­logy, and not someone on some free host­ing site who may not even notice the increase in traffic, let alone get sus­pi­cious about it?
  3. What do I do about it? I could block people from “aff” site from link­ing to my site; receiv­ing a “You’re in timeout.” mes­sage (error 403 as seen by Mark Pil­grim) might have some effect. One related ques­tion to this is why people are going to an “aff” site any­way; since the “aff” sites redir­ect people com­ing from search engines to the actu­al adult site itself one could sup­pose nobody would ever click on it. Tim sug­ges­ted people might be curi­ous; they see the URL in the search engine list­ings and type it in the address bar to see what’s there.

The adult site itself does have a tech­nic­al con­tact in the whois registry but the pur­vey­ors of the “aff” sites might not be them. Sug­ges­tions wel­come… the hits I’m get­ting have grown from noth­ing a few weeks ago to now being a sub­stan­tial part of the dir­ect hits on my site so it’s a prob­lem I want to solve soon.

/* ]]> */