Dec 162006
 

This is a story of some of the dark corners of the inter­net, with a puzzle at the end and a request for advice…

Our story starts a few weeks ago. I had installed Stat­counter on the blog post­ings to keep an eye on who vis­its my blog and why, with more inform­a­tion than you get from per­us­ing access logs (I have those too). I also like fol­low­ing links back to refer­rers to see why they’re link­ing to my site, when I have time. A few weeks ago I noticed what looked like a spam site link­ing to my blog — you know the type of URL, it’s some non­sensic­al com­bin­a­tion of let­ters and digits. So I fol­lowed it back, only to find that it was a com­plete frame of my blog. View source showed only that my site was being framed. No oth­er con­tent was being added as ads, as meta con­tent, or any­thing else that I could see. Noth­ing that explained why they’re doing this.

So I looked up the whois for the site, dis­covered it’s hid­den by a com­pany called “Domains by Proxy”, which spe­cial­izes in hid­ing regis­tra­tion data for web sites. They have lots of inform­a­tion on their site about how they cooper­ate with law enforce­ment if people are doing some­thing illeg­al, which leads me to sus­pect that unless you can prove someone’s doing some­thing illeg­al they won’t do any­thing or even talk to you. Not that I tried talk­ing to them, since simply fram­ing my site isn’t illeg­al, or even con­tra­ven­ing my Cre­at­ive Com­mons license. It is, how­ever, highly suspicious.

A little more invest­ig­a­tion was in order; the num­ber of hits on my web­site from this site were increas­ing and oth­er ver­sions of the URL were show­ing up. The URL was of the form “aff” fol­lowed by “0000” fol­lowed by a num­ber, fol­lowed by .com (yes, it’s cir­cuit­ous, but I don’t want my site linked to theirs in search engines, for reas­ons that will become obvi­ous). I checked out and found that all num­bers from 1 to 28 poin­ted to my site. So someone paid to register 28 domains, host 28 domains, and put in HTML to point to my site? None of the URLs showed up in the com­mon search engines, but some­how they were being clicked on, seem­ingly by real people (spread of ISPs across the world, dif­fer­ent OSes, screen res­ol­u­tions, and browsers, all stay­ing for approx­im­ately zero seconds).

I con­tem­plated put­ting in some frame bust­ing code but decided to wait a little and see what happened, in case they were just get­ting ready to do some­thing. In the mean­time more of these sites start point­ing at mine. And finally one of them showed up in a search engine, and there it points to an adult site. One of those ones that may not be safe at work, at least judging by the front page. In which case the frame bust­ing isn’t the answer any­way, the people vis­it­ing this site don’t want to see my mus­ings on tech­no­logy, moth­er­hood, or knit­ting, they want the adult con­tent they expect.

Tim had the bright idea at this stage of using a com­mand-line fetch on the “aff” sites and found that the index page returns a list of poten­tial mis­spellings of the adult site’s name. About 10000 of them. The oth­er sites return sim­il­ar lists; num­ber 28 only returns about 7000 mis­spellings. If you search for one of these mis­spellings in a com­mon search engine, you land on an “aff” page, which then redir­ects you to the adult site. But only if you come from a search engine. If you type in that site name in the address bar, the redir­ect sends you to my blog.

So I have a couple of ques­tions, and would appre­ci­ate any thoughts or exper­i­ences you have.

  1. Why are they not redir­ect­ing to the adult site, which is prob­ably what the people who are click­ing on an “aff” site prob­ably want? Why send them to anoth­er site? 
  2. Related ques­tion: why me? Why someone who writes about tech­no­logy, and not someone on some free host­ing site who may not even notice the increase in traffic, let alone get sus­pi­cious about it?
  3. What do I do about it? I could block people from “aff” site from link­ing to my site; receiv­ing a “You’re in timeout.” mes­sage (error 403 as seen by Mark Pil­grim) might have some effect. One related ques­tion to this is why people are going to an “aff” site any­way; since the “aff” sites redir­ect people com­ing from search engines to the actu­al adult site itself one could sup­pose nobody would ever click on it. Tim sug­ges­ted people might be curi­ous; they see the URL in the search engine list­ings and type it in the address bar to see what’s there.

The adult site itself does have a tech­nic­al con­tact in the whois registry but the pur­vey­ors of the “aff” sites might not be them. Sug­ges­tions wel­come… the hits I’m get­ting have grown from noth­ing a few weeks ago to now being a sub­stan­tial part of the dir­ect hits on my site so it’s a prob­lem I want to solve soon.

  14 Responses to “Framed!”

  1. The only thing I can ima­gine is that it’s a way to mas­quer­ade the page as a non-adult URL to some audience.

  2. Google thinks it’s the adult site but every­one else sees your site? I think it’s an attempt to gain pager­ank. Because it looks and behaves like an inno­cent link on most sites, it is less likely to be deleted from blog com­ments, wikis and so on. Google thinks many highly-ranked sites link­ing to the adult site and gives it a high score. Obvi­ously the scam­mers think you’re a prom­in­ent site so you should feel flattered. What to do about it? Hmm… if I’m right the people vis­it­ing your site using the link are genu­inely inter­ested in your site so maybe you should­n’t slap them with an error page. How about redir­ect­ing to a page that explains what’s going on and has a link to the ori­gin­al page?

  3. I sus­pect a two-pronged SEO effort is going on. On one hand they are try­ing to get traffic from mis-spellings of Adult­Friend­Find­er, on the oth­er hand they are try­ing to increase their google juice so their spe­cif­ic mis-spelling bubbles up ahead of the oth­er scam­mers try­ing the same trick.

    They are prob­ably post­ing com­ments on tech blogs that do not use rel=“nofollow”, or on for­ums, hop­ing that people will not real­ize the aff URL is a frame, and start link­ing to the aff* vari­ant instead of the actu­al source, thus increas­ing the Google PagRank of aff*.com. Tech blogs must be a highly labor-effect­ive way of increas­ing one’s PageR­ank, as opposed to, say, garden gnome blog­ging. Your site is prob­ably not the only one affected, they must use some kind of code in the frameset page URL to indic­ate which rel­ev­ant con­tent should be loaded into a frame. Since there is only one level of refer­rer track­ing in browsers you would­n’t see the ori­gin­al blog post­ing or for­um in your logs.

    Since PageR­ank is not con­text-spe­cif­ic, a high PageR­ank acquired from rel­ev­ance to tech­nic­al quer­ies also qual­i­fies you to stand out from the din of oth­ers try­ing the same typo­squat­ting scam. Once one of the sites has reached a cer­tain level of PageR­ank, they prob­ably switch the con­tent away from the tech art­icles to those typo­squat­ting Adult­Friend­Find­er. The fact you saw the actu­al search quer­ies is prob­ably an error on their part where they jumped the gun on typo­squat­ting before switch­ing off the fram­ing and repla­cing it with whatever con­tent they are actu­ally try­ing to push on would-be daters.

    The only way you can fight back against this is by using frame-buster code, at least if the refer­rer is one of those scum­bags’ domain. That only goes so far — if you do that, they may adopt out­right copy­ing of your site con­tent (I have seen my own rel­at­ively obscure site’s con­tent stolen by link farms).

    You could also inform Friend­Find­er. They are a legit­im­ate busi­ness, even if some of their prop­er­ties like Alt.com are some­what raunchy, and they can prob­ably bring leg­al fire­power to bear on Domains­By­Proxy to get the scam­mers’ iden­tity. I am not sure how the scam­mers actu­ally plan to make a profit. Per­haps by abus­ing a Friend­Find­er affil­i­ate pro­gram, in which case expos­ing the scam should make the profit motive disappear.

  4. Hi,

    Ana­lyz­ing the code itself, this what is happening. 

    With a JS-enabled browser, a hit on an “aff” page will also pull in a little Javas­cript file: x.js

    x.js basic­ally says:

    * if refer­rer is a non-blank query from a search engine (spe­cific­ally Google, MSN, Yahoo, AltaV­ista, Ask), then load the savoury page in the browser window.

    * if refer­rer is none of these, then load the file index.htm from the aff site — it is this file that con­tains the frame ref­er­ence to your site.

    People are not neces­sar­ily typ­ing in the “aff” address in their browser address bars: a search anonym­izer (a Fire­fox exten­sion?) that sends a non-search refer­rer will get the index.htm page and there­fore your site in a FRAME.

    I can­’t see any­thing mali­cious or under­hand in the code itself: the only ‘vic­tims’ beside your band­width and serv­er logs will be the search users them­selves who get “(your) mus­ings on tech­no­logy, moth­er­hood, or knitting.”

    What to do? A 403 For­bid­den is most appro­pri­ate in this case (if your Apache has mod_rewrite loaded):

    RewriteCond %{HTTP_REFERER} ^http://aff0000.*
    RewriteRule ^/.+ - [FL]

    HTH,

    Cliff

  5. Ooops, rewrite rule should read:

    RewriteCond %{HTTP_REFERER} ^http://aff0000.*
    RewriteRule ^/.+ - [F,L]

    Rgds,

    Cliff

  6. I’ve had a run in or two with some dat­ing spam/scams and it maybe use­ful to remem­ber they tend to be copy [example] and paste coders so they make all kinds of errors. Many at the same time. 

    I’m guess­ing your name and one of these aff’s will show up as blog spam pay­loads. (or as ping/trackback) Looks almost like a legit comment/ping to blog read­ers, search engines see some­thing else. That’s my first guess without think­ing about it the way scam­mers do. My second thought is it could be a phish being set up aainst aff (some of their cus­tom­ers don’t always use all the spark plugs in their engine) Just quick spec­u­lat­ing on my part.

    –Cecil

  7. For the con­tinu­ation (and prob­able end) of the story, see my next post at http://www.laurenwood.org/anyway/archives/2006/12/19/post-results/. Thanks for the com­ments, they helped me fig­ure out what was going on, and may have helped the pur­vey­ors of the aff sites decide to stop whatever they were doing.

  8. And now they’re back again. http://www.laurenwood.org/anyway/archives/2007/01/29/theyre-back/.
    So I may yet have a chance to use some of these ideas, if the band­width usage goes up too much.

  9. The link to http://www.laurenwood.org/anyway/archives/2006/12/19/post-results/ does­nt work for me, can you fix it please.
    Or I can try to look it up manually

  10. The link works for me; what are the res­ults when you click on it? 404? Some­thing else?

  11. This simply looks like mak­ing efforts to build a techi impres­sion of their site in the view of search engines.

  12. I had this hap­pen to one of my sites about a year ago. It ended up redir­ect­ing to a search engine for pup­pies (of all things). After sev­er­al spam emails I even­tu­ally pur­chased a new domain name but that did­n’t stop it either. I have read these posts and will imple­ment the post by Cliff. Hope­fully your issue gets solved soon. 

    Regards

    Gar­ret

  13. Very inter­est­ing posts.…

    I know a little bit about Adult­friend­find­er­’s affil­i­ate sys­tem, and one can make money there by simply send­ing there a vis­it­or. I think your spam­mers were doing that. And I am sure they got to you, and exploited your blo­g’s high google rank­ing to pro­mote them­selves high­er on the search engines… I actu­ally found you through a blo­g­post­ing soft­ware: I have a dat­ing advice site and wanted to get a com­ment to get a link… I’ll use my blog instead… I don’t want you to say I am spam­ming your board 🙂

    Thank you for edu­cat­ing us/me. I appre­ci­ate it.
    Sophie

  14. I am also not sure how the scam­mers actu­ally plan to make a profit.

Leave a Reply to Richard benz Cancel reply

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

(required)

(required)

/* ]]> */