Mar 072014

Langara is a loc­al col­lege offer­ing degrees in a num­ber of sub­jects, includ­ing Com­puter Stud­ies. I know one of the instruct­ors there, and he asked me to give a talk at their monthly Com­puter Tech meetup. As a top­ic, I picked Sim­ple Prin­ciples for Web­site Secur­ity, a short­er ver­sion of talks I’ve given at the XML Sum­mer School.

Apart from the fact that I was recov­er­ing from a bout with the vir­u­lent stom­ach bug that seemed to be going round Van­couver at the time, it was fun. A good bunch of people, decent ques­tions, and the stu­dent news­pa­per took advant­age of the oppor­tun­ity to write a column and make a video about basic inter­net secur­ity. One of my aims in this talk is to make the audi­ence para­noid, point­ing out some­times the bad guys really are out to get you, and talk­ing a bit about risk ana­lys­is and the trade-offs involved in writ­ing down strong pass­words (using a pass­word man­ager is bet­ter, of course). And the door prizes for Langara stu­dents were quite impress­ive!

Thanks to Ray­mond for invit­ing me, and Gail and Holly for organ­ising everything. I put the slides up at slide­share if you’re inter­ested.

Aug 272013

For the XML Sum­mer School this year, I’m teach­ing about HTML5, CSS3 and ePub in the Hands-on Web Pub­lish­ing course. The basic premise of the course is to show what tech­no­lo­gies are involved in tak­ing a bunch of Word doc­u­ments or XML files and turn­ing them into a decent-looking web­site or ePub. The course includes les­sons on rel­ev­ant bits of XSLT trans­form­a­tion (since Word is XML under the cov­ers, if you dig deeply enough), script­ing in Ruby to auto­mate as much as pos­sible, and, of course, enough inform­a­tion about HTML and CSS that people can make a decent-looking web­site in class in the hands-on part.

As a start­ing point for the exer­cises, we’ll use a gen­er­ated tem­plate from HTML5 boil­er­plate, since, if you pick the right options, it is rel­at­ively clean and sim­ple to under­stand. Look­ing at the cur­rent com­mon design prac­tices used across a num­ber of options (HTML5 boil­er­plate, Boot­strap, Word­Press tem­plates for example) coupled with web com­pon­ents and the sheer size and num­ber of HTML5-related spe­cific­a­tions from WHATWG and the W3C, I’m won­der­ing just how much more com­plic­ated it can all get before the pen­du­lum starts swinging back again towards sim­pli­city and sep­ar­a­tion of con­tent from pro­cessing. Even a bare-bones tem­plate has a num­ber of lines in it to deal with older ver­sions of IE, or to load some JavaS­cript or (mostly) jQuery lib­rary. It’s no won­der we’re start­ing to see so many frame­works that try to cov­er up all of that com­plex­ity (Boot­strap again, or Ember, for example). 

In the mean­time, at least I have a reas­on­ably con­strained use case to help me decide which of the myri­ad pos­sib­il­it­ies are worth spend­ing time teach­ing, and which are best left for the del­eg­ates to read up on after the class. 

Nov 132007

There are some issues with Web 2.0, mostly in the areas of pri­vacy, secur­ity, copy­right — all those things you’d rather you didn’t need to worry about. Take pri­vacy for example. On many social net­work­ing sites people sign up and then put in all their per­son­al inform­a­tion simply because there’s a field there for it. Often those pro­files are pub­lic by default, rather than private, and often they’re open to search engines as well. So people think their inform­a­tion is private and then dis­cov­er it isn’t, and have to go search­ing through menus to find out how to turn on those pri­vacy fil­ters that are turned off by default. In many cases what’s good for the site own­ers isn’t neces­sar­ily good for the users. One big factor in Flickr’s early suc­cess was the fact that uploaded pho­tos could be seen by the world unless spe­cific­ally made private, and lots of users did (and still do) get con­fused by copy­right issues (cre­at­ive com­mons licenses don’t solve the issue of what “pub­lic domain” etc actu­ally mean).

Then there’s the per­sona issue. I might have a leg­al but slightly embar­rass­ing hobby that I don’t want work know­ing about. So I need to set up a sep­ar­ate online iden­tity for that — people need to think about the implic­a­tions of this in advance if they don’t want cor­rel­a­tions of that hobby per­sona with their “real” one on the basis of an address or phone num­ber or email.

Oth­er prob­lems with the pleth­ora of new Web 2.0 social net­work­ing sites: they often don’t under­stand what pri­vacy and user con­sent mean. You sign up for some­thing, they ask you to upload your address book to see wheth­er oth­er friends are already there, the next thing you know they’ve done spam-a-friend and emailed every­one in your address book without your know­ledge, let alone your con­sent. Or they ask you to give them your user­name and pass­word to some oth­er social net­work­ing site under the “trust us, we will do no evil” mot­to (whatever happened to “trust but veri­fy”?).

There are some solu­tions to this: users have to be care­ful about the inform­a­tion they hand out (fake birth­d­ates, any­one?) and start demand­ing that sites take care of their inform­a­tion. If I want to hand out inform­a­tion to the world, that’s my decision, but it shouldn’t be up to some web site to make that decision for me.

The last of a series on Web 2.0, taken from my talk at the CSW Sum­mer School in July 2007. Here’s the series intro­duc­tion.

Nov 122007

The third aspect of Web 2.0, which is often under-appreciated, is the pro­cess aspect. This has changed people’s expect­a­tions of what soft­ware can do, and how it should be delivered. This cat­egory includes open source, con­tinu­al beta and quick release cycles, and some new busi­ness mod­els.

Process CloudPro­cess Cloud

Not all of the things that are import­ant in Web 2.0 are new, of course. Open Source soft­ware has been around for a long time, but I would argue that it has nev­er been as pop­ular as now, where more people have the abil­ity to con­trib­ute their time and tal­ent to pro­jects for which they’re not dir­ectly paid (unless they’re lucky enough to work for a com­pany that sup­ports such pro­jects).

The con­cepts of con­tinu­al beta and quick release cycles are new though. It wasn’t that long ago that you could only buy consumer-level soft­ware in boxes with pretty pic­tures and prin­ted manu­als, either in stores or by call­ing com­pan­ies. For expens­ive soft­ware that needed con­sult­ing ser­vices to install and con­fig­ure sales reps would vis­it if you worked for a large enough com­pany. To take part in a beta pro­gram you needed to know someone who worked in the com­pany and sign an NDA, and it was a small, tightly-controlled circle.

These days the Web 2.0 browser-based applic­a­tions don’t need hand-holding to install and con­fig­ure, so the server load is the big con­straint on how many people can take part at once. There are sev­er­al fla­vours of beta pro­grams: invite some “thought lead­ers” and ask them to invite their friends in the hope they’ll blog a lot about it (Gmail did this, you got 6 invites, then 50, then you could invite 250 of your closest friends to take part, most of whom already had gmail accounts); unlim­ited invites start­ing with a small circle; sign up on a wait­ing list; allow in any­one from cer­tain com­pan­ies (dopplr does this, with the twist that the mem­bers can then invite any­one they like).

The “con­tinu­al beta” bit comes from the fact that these applic­a­tions are updated quickly; these updates are often tried out on some of the users before being rolled out to all. Flickr appar­ently had hun­dreds of incre­ment­al releases in 18 months from Feb­ru­ary 2004 to Octo­ber 2005 (stated in O’Reilly’s Web 2.0 Prin­ciples and Best Prac­tices; I couldn’t find an online ref­er­ence oth­er than that report). The line between a beta and a non-beta applic­a­tion seems to be a fine one; the only dis­tinc­tion in many cases that the user can see is the word “beta” on the web site. Con­tinu­al releases give users a reas­on to come back often, new fea­tures can be tested and fixed quickly. Of course, this sort of sys­tem doesn’t really work for fun­da­ment­al soft­ware such as oper­at­ing sys­tems, data­bases, browsers, iden­tity pro­viders, and dir­ect­ory ser­vices, where you want full-on secur­ity and regres­sion test­ing, but it does work for the Web 2.0 applic­a­tions that run on those bits of fun­da­ment­al soft­ware.

And in keep­ing with the user-created ten­ets of Web 2.0, plat­forms such as Face­book that enable third-party developers to write applic­a­tions to run on that plat­form also ful­fill the func­tion of con­tinu­ally adding fea­tures to the applic­a­tion without the own­ers need­ing to code any­thing, or pay people to add fea­tures. The users do it all for them — use the plat­form, add fea­tures to the plat­form, mar­ket their added fea­tures. The own­ers sup­ply the hard­ware and the basic infra­struc­ture (which needs to be stable and reli­able) and the users do the rest. At least, that’s the the­ory and the hope.

Which brings us to the busi­ness mod­els. How do people pay for the hard­ware, soft­ware, pro­gram­mers, mar­ket­ing? There are a num­ber of ways in which Web 2.0 com­pan­ies try to cov­er the bills for long enough to sur­vive until they can be acquired by some big­ger com­pany. One is advert­ising. Google and its com­pet­it­ors have made it easy for even small web sites, such as blog­gers in the long tail, to make some money from ads. It’s more than enough to pay the bills for some sites, since it’s now cheap or free to build and launch a site. Some sites are free when you watch the ads, but you can pay for an ad-free ver­sion. Or free for private use, but cost some­thing for com­mer­cial use. And then there’s the vari­ant where a basic account is free, but you have to pay if you want more fea­tures, such as upload­ing files, or upload­ing more than a cer­tain num­ber of pho­tos. A vari­ant for open source soft­ware is that the soft­ware is free, but you need to pay for sup­port or real help in con­fig­ur­ing it, or to get new releases more quickly.

One of a series on Web 2.0, taken from my talk at the CSW Sum­mer School in July 2007. Here’s the series intro­duc­tion. Com­ing up next: some issues with Web 2.0

Nov 092007

The tech­nic­al com­pon­ent of Web 2.0 includes XML, Ajax, APP, vari­ous pro­gram­ming lan­guages, plug-ins and wid­gets, and the REST archi­tec­ture. All of these have a role to play in sup­port­ing the web sites that incor­por­ate Web 2.0 fea­tures, while many pred­ate the Web 2.0 phe­nomen­on. There are far too many inter­est­ing tech­nic­al fea­tures for me to talk about all of them in one post, of course, but this post should at least intro­duce you to some of the more inter­est­ing acronyms.

Technical CloudOblig­at­ory tag cloud: this one con­tains some tech­nic­al terms

Devel­op­ing Web 2.0 applic­a­tions is easi­er than devel­op­ing large enterprise-style applic­a­tions. The developer toolkits are a lot easi­er to use, and it’s much faster to cre­ate some­thing. 37 sig­nals, who make Base­camp, among­st oth­er tools, say they put it up in four months with 2.5 developers using Rails, a devel­op­ment frame­work. For developers there’s now a range of lan­guage options, from PHP to C++ or JavaEE, with new­er plat­forms and lan­guages like Ruby and Rails grabbing mind­share as well. People can pro­gram in the sys­tem they’re com­fort­able with, and although there’s a cer­tain amount of snooty dis­par­age­ment of each lan­guage from pro­ponents of some oth­er one, what mat­ters in the end is using the right tool for the job. I’ve seen bad code writ­ten in Java and good code in PHP, and a sys­tem that does less but does it well is prefer­able to my mind to one that does a lot really badly.

Ajax (Wiki­pe­dia link) is another import­ant Web 2.0 tech­no­logy. It’s really a short­hand to describe a bunch of tech­no­lo­gies (HTML, CSS, DOM, JavaS­cript) that are tied togeth­er, using the browser to cre­ate a rich­er envir­on­ment by tying in script­ing and a way to request inform­a­tion from the server without for­cing the entire page to be reloaded. It’s power­ful and inter­act­ive and can be much faster than oth­er meth­ods of adding inter­activ­ity to the web pages. There are lots of books on the sub­ject, which is a reas­on­able indic­at­or of the interest in it. 

Since it com­bines a lot of dif­fer­ent applic­a­tions, debug­ging can be a prob­lem. Some basic rules that I’ve found use­ful are: first make sure your HTML/XHTML val­id­ates, then make sure your CSS val­id­ates, then use Fire­fox with the Fire­bug exten­sion to debug the rest. Once you have that work­ing, you can make the changes for oth­er browsers as appro­pri­ate.

Poorly writ­ten Ajax does have some prob­lems, such as not being able to book­mark res­ults, or the back but­ton not going back to the right place. The big prob­lem is the non-standardized XML­Ht­tpRe­quest object in JavaS­cript, the object that lets your page talk to the server and get the right inform­a­tion. The way it works var­ies between dif­fer­ent browsers and dif­fer­ent ver­sions of the same browser (IE 6 to IE 7, for example). Although W3C is start­ing to work on stand­ard­iz­ing it, that will take some time. Another prob­lem is the “A” in Ajax — it’s asyn­chron­ous, which means that inter­net latency can be an issue.

These prob­lems can be solved — there are Ajax toolkits avail­able which hide the XML­Ht­tpRe­quest and oth­er browser incom­pat­ib­il­it­ies, some applic­a­tions have figured out the back but­ton and the book­mark­ing URL issues, the asyn­chron­ous issues can be dealt with by break­ing the applic­a­tions up into small seg­ments which take into account the fact that the oth­er end may nev­er respond. And as a res­ult of these toolkits and tech­niques, Ajax is now a major com­pon­ent of many web­sites, even those that aren’t for Web 2.0 star­tups.

REST is an archi­tec­tur­al frame­work that explains a lot of why the web is so suc­cess­ful. Roy Fielding’s PhD thes­is was the first place where it was codi­fied (and he coined the term). Basic­ally the idea is that everything that you can reach on the web should be a resource with a web address (URI) that you can reach with stand­ard HTTP verbs, and that will have oth­er URIs embed­ded in it. There’s more to REST, of course, and I’m sure the pur­ists will take issue with my over-simplified descrip­tion.

REST is widely used in what I call Ajax APIs — the APIs that vari­ous applic­a­tions have that let people get access to the data. Mash-ups, where you take data from one ser­vice and com­bine it with another ser­vice, use these APIs all the time. The clas­sic example of a mash-up was to take Craigslist rent­al data and mash it with Google map­ping data onto a third web site (hous­ingmaps) without Craig­list or Google being involved to start with. There are now vast num­bers of mash-ups and lots of toolkits to help you cre­ate them. One prob­lem with mash-ups is that the people provid­ing the data may not care to have you take it (for example, if they run ads on their sites); the Web 2.0 solu­tion to that is that if you own the data, you need to add more value to it that can’t be mashed as eas­ily. Amazon has book reviews on top of the basic book data, for example, so people use Amazon as a ref­er­ence link.

The con­cept of mash-ups goes fur­ther into plat­forms that sup­port plug-ins and wid­gets. One of the appeal­ing things about Face­book is the fact that applic­a­tion developers can write wid­gets to do vari­ous things (from the trivi­al to the heavy-weight) that use the inform­a­tion that Face­book provides (this has pri­vacy implic­a­tions, but more about that in a later post). In a sense, this is about sites (usu­ally com­mer­cial sites) using the social aspect of Web 2.0 (user-created con­tent) to provide more fea­tures to their users, and is tightly tied to the pro­cess implic­a­tions of Web 2.0 (more about that in the next post).

The Atom Pub­lish­ing Pro­to­col is fairly recent. Atom is the cleaned-up ver­sion of RSS and gives you a feed of inform­a­tion, tagged with metadata such as author, pub­lished date, and title. There is now also a pro­to­col to go with it, designed for edit­ing and pub­lish­ing web resources using HTTP. It can be used as a replace­ment for the vari­ous blog-based pub­lish­ing APIs, which were used to allow people to post to their blogs from dif­fer­ent edit­ors, but it’s now obvi­ous that it can be used to carry oth­er inform­a­tion as well, and not just for blogs. Since it’s a REST-based API that uses basic HTTP, it can be used for more gen­er­al client-server HTTP-based com­mu­nic­a­tion. A good over­view is on the IBM developer site.

One of a series on Web 2.0, taken from my talk at the CSW Sum­mer School in July 2007. Here’s the series intro­duc­tion. Com­ing up next: pro­cess aspects of Web 2.0

Nov 082007

The social and col­lab­or­a­tion part of Web 2.0 mostly revolves around the con­cepts of social net­work­ing, user-generated con­tent, and the long tail.

Social CloudSocial Cloud

Social net­work­ing is the idea that people can meet and talk and organ­ise their social lives using the Web instead of, or in addi­tion to, more tra­di­tion­al meth­ods such as talk­ing face to face, or on the phone. It’s an exten­sion of usen­et and bul­let­in boards that’s based on the web, with more fea­tures. Social net­work­ing sites tend to go through phases; every­one was into Orkut for a while, now it’s MySpace and Face­book, or Ravelry if you’re a knit­ter. Fea­tures and focus vary, but the idea of cre­at­ing an online com­munity remains the same.

User-generated con­tent is the idea that non-professionals can con­trib­ute con­tent. I don’t like the term much, so I’m going to use the vari­ant user-created con­tent to show that it’s a cre­at­ive pro­cess, not just some machine gen­er­at­ing con­tent. The con­cept of user-created con­tent isn’t new; the Web was first designed as a col­lab­or­a­tion plat­form, the read/write web. In prac­tic­al terms, how­ever, it was dif­fi­cult for those without lots of tech­nic­al know­ledge to pub­lish on the web. All these things like blog­ging and com­ment­ing that are now rel­at­ively easy for people to do weren’t, just a few years ago. Pre­vi­ously only a few people could make their opin­ions widely known, in prac­tice pro­fes­sion­als with access. Don’t for­get that one of the reas­ons Ben­jamin Frank­lin could make such a dif­fer­ence in the early years of the US was that he owned a print­ing press!

Now basic­ally every­one with access to the inter­net who’s inter­ested can pub­lish their opin­ions, their pho­tos, or their videos to their friends and the world. It’s easi­er to keep in touch with friends far away, or find out what life’s like in some far-off place, or con­trib­ute a snip­pet of know­ledge to Wiki­pe­dia. Some of these pub­lish­ers (blog­gers, com­menters, photo-uploaders) have a large audi­ence, many have an audi­ence that is large enough for them (which may mean just the fam­ily, or just them­selves, or a few hun­dred strangers).

One of the down­sides of this “demo­crat­iz­a­tion”, as it’s some­times called, is that it can be hard to find the really good inform­a­tion or enter­tain­ment — you hear a lot about “cult of the ama­teur” and “90% of everything is crap”. Some of this is com­ing from those who are threatened by the avail­ab­il­ity of inform­a­tion from oth­er sources: journ­al­ists and news­pa­pers in par­tic­u­lar are right to be scared, since they’re now going to have to work harder to con­vince the world that they add value. Wheth­er the enter­tain­ment cre­ated by ama­teurs that’s avail­able on the web is bet­ter than that cre­ated by the mass enter­tain­ment industry depends on your view of how good a job the lat­ter does at find­ing and nur­tur­ing tal­ent.

The long tail is another aspect of Web 2.0 that you hear about a lot. Book­sellers are a good example of how the long tail works: Where­as your aver­age book­seller, even Water­stones or Blackwell’s, has may­be a few thou­sand or a few tens of thou­sands of books, an inter­net seller can have mil­lions. Although the com­par­is­on is per­haps not fair, since an inter­net book­seller, just like your loc­al book­seller, can order from the pub­lish­er and will usu­ally count that as being part of the invent­ory for brag­ging reas­ons. And, of course, you can always go to Powell’s Books in Portland, which claims to have over a mil­lion books phys­ic­ally in their store. It’s big; they hand out maps at the entrance so you don’t get lost.

The long-tail aspect is this: It turns out that most of the rev­en­ue doesn’t come from selling the Harry Pot­ter books, big sellers though those are, it’s from selling those books that aren’t indi­vidu­ally big sellers. The total volume of sales in those niche areas is lar­ger than the best-sellers. Oth­er com­pan­ies that make good use of this of course are eBay, where you can buy things that you can’t get down­town, uptown, or poten­tially any­where in your town, and the video rent­al com­pany Net­flix, which rents out some 35,000 titles in the one mil­lion videos it sends out each day.

And, of course, the long tail applies to blogs and oth­er online sites. In oth­er words, no mat­ter how spe­cial­ised your blog is, someone out there in blog-reading land is likely to find it inter­est­ing. The big prob­lem is how those poten­tial read­ers find out about it.

One of a series on Web 2.0, taken from my talk at the CSW Sum­mer School in July 2007. Here’s the series intro­duc­tion. Com­ing up next: tech­nic­al aspects of Web 2.0