Framed!

This is a story of some of the dark corners of the inter­net, with a puzzle at the end and a request for advice…

Our story starts a few weeks ago. I had installed Stat­counter on the blog post­ings to keep an eye on who vis­its my blog and why, with more inform­a­tion than you get from per­us­ing access logs (I have those too). I also like fol­low­ing links back to refer­rers to see why they’re link­ing to my site, when I have time. A few weeks ago I noticed what looked like a spam site link­ing to my blog — you know the type of URL, it’s some non­sensic­al com­bin­a­tion of let­ters and digits. So I fol­lowed it back, only to find that it was a com­plete frame of my blog. View source showed only that my site was being framed. No oth­er con­tent was being added as ads, as meta con­tent, or any­thing else that I could see. Noth­ing that explained why they’re doing this.

So I looked up the whois for the site, dis­covered it’s hid­den by a com­pany called “Domains by Proxy”, which spe­cial­izes in hid­ing regis­tra­tion data for web sites. They have lots of inform­a­tion on their site about how they cooper­ate with law enforce­ment if people are doing some­thing illeg­al, which leads me to sus­pect that unless you can prove someone’s doing some­thing illeg­al they won’t do any­thing or even talk to you. Not that I tried talk­ing to them, since simply fram­ing my site isn’t illeg­al, or even con­tra­ven­ing my Cre­at­ive Com­mons license. It is, how­ever, highly suspicious.

A little more invest­ig­a­tion was in order; the num­ber of hits on my web­site from this site were increas­ing and oth­er ver­sions of the URL were show­ing up. The URL was of the form “aff” fol­lowed by “0000” fol­lowed by a num­ber, fol­lowed by .com (yes, it’s cir­cuit­ous, but I don’t want my site linked to theirs in search engines, for reas­ons that will become obvi­ous). I checked out and found that all num­bers from 1 to 28 poin­ted to my site. So someone paid to register 28 domains, host 28 domains, and put in HTML to point to my site? None of the URLs showed up in the com­mon search engines, but some­how they were being clicked on, seem­ingly by real people (spread of ISPs across the world, dif­fer­ent OSes, screen res­ol­u­tions, and browsers, all stay­ing for approx­im­ately zero seconds).

I con­tem­plated put­ting in some frame bust­ing code but decided to wait a little and see what happened, in case they were just get­ting ready to do some­thing. In the mean­time more of these sites start point­ing at mine. And finally one of them showed up in a search engine, and there it points to an adult site. One of those ones that may not be safe at work, at least judging by the front page. In which case the frame bust­ing isn’t the answer any­way, the people vis­it­ing this site don’t want to see my mus­ings on tech­no­logy, moth­er­hood, or knit­ting, they want the adult con­tent they expect.

Tim had the bright idea at this stage of using a com­mand-line fetch on the “aff” sites and found that the index page returns a list of poten­tial mis­spellings of the adult site’s name. About 10000 of them. The oth­er sites return sim­il­ar lists; num­ber 28 only returns about 7000 mis­spellings. If you search for one of these mis­spellings in a com­mon search engine, you land on an “aff” page, which then redir­ects you to the adult site. But only if you come from a search engine. If you type in that site name in the address bar, the redir­ect sends you to my blog.

So I have a couple of ques­tions, and would appre­ci­ate any thoughts or exper­i­ences you have.

  1. Why are they not redir­ect­ing to the adult site, which is prob­ably what the people who are click­ing on an “aff” site prob­ably want? Why send them to anoth­er site? 
  2. Related ques­tion: why me? Why someone who writes about tech­no­logy, and not someone on some free host­ing site who may not even notice the increase in traffic, let alone get sus­pi­cious about it?
  3. What do I do about it? I could block people from “aff” site from link­ing to my site; receiv­ing a “You’re in timeout.” mes­sage (error 403 as seen by Mark Pil­grim) might have some effect. One related ques­tion to this is why people are going to an “aff” site any­way; since the “aff” sites redir­ect people com­ing from search engines to the actu­al adult site itself one could sup­pose nobody would ever click on it. Tim sug­ges­ted people might be curi­ous; they see the URL in the search engine list­ings and type it in the address bar to see what’s there.

The adult site itself does have a tech­nic­al con­tact in the whois registry but the pur­vey­ors of the “aff” sites might not be them. Sug­ges­tions wel­come… the hits I’m get­ting have grown from noth­ing a few weeks ago to now being a sub­stan­tial part of the dir­ect hits on my site so it’s a prob­lem I want to solve soon.

Pat’s Lightbulb

I have the good for­tune to work with Pat Pat­ter­son at Sun and one of the things we dis­cussed quite a lot shortly before I went on mater­nity leave was how to make it easi­er for people to use Liberty pro­to­cols for their iden­tity needs. One of the com­plaints I’ve heard is that there isn’t enough sample code in the world show­ing how to use and imple­ment SAML. Giv­en that Sun­’s Access Man­ager does imple­ment SAML, along with vari­ous oth­er Liberty Alli­ance stand­ards, it seemed like it should be pos­sible to put togeth­er some sample code that uses Access Man­ager. And, giv­en that Access Man­ager is now open source as part of OpenSSO, it made sense to cre­ate anoth­er open source pro­ject. But, this pro­ject should use lan­guages oth­er than Java, to give the LAMP (or MARS) developers and imple­ment­ors some code that they can use, tweak, and fur­ther devel­op. And put back into the pro­ject of course <grin>. I came up with a bunch of use­less names, and Pat came up with Light­bulb (goes with LAMP). Then as I waddled off into mater­nity leave, Pat did the pro­gram­ming and came up with a way to imple­ment a SAML 2.0 ser­vice pro­vider in pure PHP, without even need­ing the OpenSSO or Access Man­ager code. 

Pat’s giv­ing a webin­ar on this tomor­row morn­ing Pacific time; you need to register for it first.

We’re hop­ing that oth­er people will con­trib­ute rel­ev­ant code, in any lan­guage, for people to use when they want to imple­ment or integ­rate SAML cap­ab­il­it­ies into their sys­tems, wheth­er they’re blog­ging sys­tems, wikis, or any­thing else where iden­tity man­age­ment is use­ful. The pro­ject is loc­ated here; it’s easy to join, add a sub-pro­ject, and com­mit some code. Or just browse and see what’s there and what’s use­ful. Have fun!

On the Air Again

The mov­ing went rel­at­ively pain­lessly, although I should really have waited until the DNS move had taken effect before killing my DYNDNS account, since that meant the site was out of com­mis­sion for a little longer than abso­lutely neces­sary. Mind you, that was prob­ably all of two hours, so not a big deal. Everything should now work again as before.

The quick ver­sion of the steps I took to move Any­way:

  1. copy all the Word­Press files and rel­ev­ant plu­gin files to the new ISP site
  2. clean up the MySQL data­base as much as pos­sible to cut down on size; mostly delet­ing SpamKarma logs and old com­ment spam
  3. deac­tiv­ate all the plu­gins except the spam fighter
  4. export the MySQL data­base to SQL state­ments (I use phpmy­ad­min for this)
  5. import the MySQL data­base to the MySQL data­base set up on the new ISP
  6. make sure the config.php file has the right con­fig­ur­a­tion settings
  7. noti­fy the domain regis­trar of the new DNS settings
  8. delete the rel­ev­ant part of the dyndns account (that could have been done later)
  9. wait
  10. once host laurenwood.org shows the new DNS has taken effect, run the Word­Press upgrade script
  11. turn on the oth­er plugins
  12. run xenu to check for any broken links

My web site does seem faster now, and my web surf­ing is no longer com­pet­ing with the spam com­ments for band­width, so I’d say it’s a win all around.

CfS: NV2007

Enough acronyms for now — the Call for Speak­ers for North­ern­Voice 2007 is open! North­ern Voice is Van­couver­’s blog­ging con­fer­ence, focus­sing on per­son­al blog­ging. This means talks on how to solve some com­pany’s PR prob­lems are not really in scope, though tips on how to run a per­son­al blog when you’re also an exec­ut­ive at a well-known com­pany would be. We’re doing the two-day ver­sion again, where Moose­Camp is an “uncon­fer­ence” on Fri­day Feb­ru­ary 23rd, 2007, and the con­fer­ence prop­er is on Sat­urday Feb­ru­ary 24th, 2007.

In pre­vi­ous years we’ve held the con­fer­ence down­town in Van­couver, but we could­n’t get the space we wanted this year. So we’re going out to the UBC main cam­pus, way out west in Van­couver, about as far west as you can go without fall­ing off into the Geor­gia Strait (note, it’s still in Van­couver prop­er, south of the Lions Gate bridge). Cyp­ri­en man­aged to get us space in the Forestry Sci­ences Centre (pho­tos) so we can have all the space we need for talks and the self-organ­ized child­care. I think this will be a fun con­fer­ence, par­tic­u­larly as I’m not plan­ning on being any­where else the week before. Last year I was jet­lagged, hav­ing got back from a trip to Rome the night before, and I still had a good time at the conference.

Oth­er mem­bers of the organ­ising com­mit­tee have blogged it already: Bor­is, Bri­an, Dar­ren, and Kris all have their takes on what’s import­ant about this conference.

At the selec­tion meet­ing I’ll be look­ing for pro­posed talks that cov­er one or more of the groups of people we’ll have in the audi­ence. As a side-note, please don’t just say you can talk about any­thing. That really does­n’t help us fig­ure out who should talk on what — if you have an idea and we think a vari­ant of it would work bet­ter, don’t worry, we won’t be shy in ask­ing you to change focus a bit! I expect we’ll have few­er new­comers to blog­ging, although we will have some of those; to make up for it I expect we’ll have a cer­tain num­ber of people who feel they’ve already said everything that they can say and want to some tips on keep­ing up the excite­ment and interest in what they’re blog­ging. We’ll have some people who want tips on how to incor­por­ate pho­tos, video, or audio bet­ter, and some who still aren’t sure what style sheets are all about. In your speak­er sub­mis­sion, tell us who you’re aim­ing at and what know­ledge they need (or don’t); this will help us fig­ure out how to put everything togeth­er. This is a fun and edu­ca­tion­al con­fer­ence and good speak­ers are part of that, so please put a bit of time into those sub­mis­sions to make it easi­er for us to pick out the good speak­ers! The dead­line is Novem­ber 28th, and this is a real dead­line. Please do use the form and don’t just send us email as we want to make sure we don’t over­look any sug­ges­ted talks, or lose them in some­body’s over-eager spam fil­ter. Oh, and by the way, let us know of talks you’d like to see, even if you don’t want to give them.

Liberty Deployments

It’s good to see ana­lysts writ­ing sen­tences like “the Liberty spe­cific­a­tions are res­on­at­ing with major IT user organ­iz­a­tions” (quoted in an InfoWorld art­icle entitled E‑government Group forms with­in Liberty Alli­ance). It shows that the Liberty spe­cific­a­tions (and not just fed­er­a­tion) are being imple­men­ted and deployed.

Which brings me to the main point of this post­ing — if you know of Liberty deploy­ments that are worthy of pub­lic atten­tion, pro­pose them for the Iden­tity Deploy­ment of the Year awards. Nom­in­a­tions close on Monday, August 21st, and the judges are wait­ing to see what you can nom­in­ate! The win­ners will be announced on stage at Digit­al IDWorld. I’m hop­ing we get to see some deploy­ments that are illus­trat­ive of the wide range of prob­lems that Liberty Alli­ance spe­cific­a­tions solve. Paul of course wants a People Ser­vice imple­ment­a­tion to win; are there any cool ones out there that will sway the oth­er judges as well?