Presentations - Anyway

Langara

Langara is a local college offering degrees in a number of subjects, including Computer Studies. I know one of the instructors there, and he asked me to give a talk at their monthly Computer Tech meetup. As a topic, I picked Simple Principles for Website Security, a shorter version of talks I’ve given at the XML Summer School.

Apart from the fact that I was recovering from a bout with the virulent stomach bug that seemed to be going round Vancouver at the time, it was fun. A good bunch of people, decent questions, and the student newspaper took advantage of the opportunity to write a column and make a video about basic internet security. One of my aims in this talk is to make the audience paranoid, pointing out sometimes the bad guys really are out to get you, and talking a bit about risk analysis and the trade-offs involved in writing down strong passwords (using a password manager is better, of course). And the door prizes for Langara students were quite impressive!

Thanks to Raymond for inviting me, and Gail and Holly for organising everything. I put the slides up at slideshare if you’re interested.

Teaching HTML5

For the XML Summer School this year, I’m teaching about HTML5, CSS3 and ePub in the Hands-on Web Publishing course. The basic premise of the course is to show what technologies are involved in taking a bunch of Word documents or XML files and turning them into a decent-looking website or ePub. The course includes lessons on relevant bits of XSLT transformation (since Word is XML under the covers, if you dig deeply enough), scripting in Ruby to automate as much as possible, and, of course, enough information about HTML and CSS that people can make a decent-looking website in class in the hands-on part.

As a starting point for the exercises, we’ll use a generated template from HTML5 boilerplate, since, if you pick the right options, it is relatively clean and simple to understand. Looking at the current common design practices used across a number of options (HTML5 boilerplate, Bootstrap, WordPress templates for example) coupled with web components and the sheer size and number of HTML5-related specifications from WHATWG and the W3C, I’m wondering just how much more complicated it can all get before the pendulum starts swinging back again towards simplicity and separation of content from processing. Even a bare-bones template has a number of lines in it to deal with older versions of IE, or to load some JavaScript or (mostly) jQuery library. It’s no wonder we’re starting to see so many frameworks that try to cover up all of that complexity (Bootstrap again, or Ember, for example).

In the meantime, at least I have a reasonably constrained use case to help me decide which of the myriad possibilities are worth spending time teaching, and which are best left for the delegates to read up on after the class.

Web 2.0: Issues

There are some issues with Web 2.0, mostly in the areas of privacy, security, copyright — all those things you’d rather you didn’t need to worry about. Take privacy for example. On many social networking sites people sign up and then put in all their personal information simply because there’s a field there for it. Often those profiles are public by default, rather than private, and often they’re open to search engines as well. So people think their information is private and then discover it isn’t, and have to go searching through menus to find out how to turn on those privacy filters that are turned off by default. In many cases what’s good for the site owners isn’t necessarily good for the users. One big factor in Flickr’s early success was the fact that uploaded photos could be seen by the world unless specifically made private, and lots of users did (and still do) get confused by copyright issues (creative commons licenses don’t solve the issue of what “public domain” etc actually mean).

Then there’s the persona issue. I might have a legal but slightly embarrassing hobby that I don’t want work knowing about. So I need to set up a separate online identity for that — people need to think about the implications of this in advance if they don’t want correlations of that hobby persona with their “real” one on the basis of an address or phone number or email.

Other problems with the plethora of new Web 2.0 social networking sites: they often don’t understand what privacy and user consent mean. You sign up for something, they ask you to upload your address book to see whether other friends are already there, the next thing you know they’ve done spam-a-friend and emailed everyone in your address book without your knowledge, let alone your consent. Or they ask you to give them your username and password to some other social networking site under the “trust us, we will do no evil” motto (whatever happened to “trust but verify”?).

There are some solutions to this: users have to be careful about the information they hand out (fake birthdates, anyone?) and start demanding that sites take care of their information. If I want to hand out information to the world, that’s my decision, but it shouldn’t be up to some web site to make that decision for me.

The last of a series on Web 2.0, taken from my talk at the CSW Summer School in July 2007. Here’s the series introduction.

Web 2.0: Process

The third aspect of Web 2.0, which is often under-appreciated, is the process aspect. This has changed people’s expectations of what software can do, and how it should be delivered. This category includes open source, continual beta and quick release cycles, and some new business models.

Process Cloud

Not all of the things that are important in Web 2.0 are new, of course. Open Source software has been around for a long time, but I would argue that it has never been as popular as now, where more people have the ability to contribute their time and talent to projects for which they’re not directly paid (unless they’re lucky enough to work for a company that supports such projects).

The concepts of continual beta and quick release cycles are new though. It wasn’t that long ago that you could only buy consumer-level software in boxes with pretty pictures and printed manuals, either in stores or by calling companies. For expensive software that needed consulting services to install and configure sales reps would visit if you worked for a large enough company. To take part in a beta program you needed to know someone who worked in the company and sign an NDA, and it was a small, tightly-controlled circle.

These days the Web 2.0 browser-based applications don’t need hand-holding to install and configure, so the server load is the big constraint on how many people can take part at once. There are several flavours of beta programs: invite some “thought leaders” and ask them to invite their friends in the hope they’ll blog a lot about it (Gmail did this, you got 6 invites, then 50, then you could invite 250 of your closest friends to take part, most of whom already had gmail accounts); unlimited invites starting with a small circle; sign up on a waiting list; allow in anyone from certain companies (dopplr does this, with the twist that the members can then invite anyone they like).

The “continual beta” bit comes from the fact that these applications are updated quickly; these updates are often tried out on some of the users before being rolled out to all. Flickr apparently had hundreds of incremental releases in 18 months from February 2004 to October 2005 (stated in O’Reilly’s Web 2.0 Principles and Best Practices; I couldn’t find an online reference other than that report). The line between a beta and a non-beta application seems to be a fine one; the only distinction in many cases that the user can see is the word “beta” on the web site. Continual releases give users a reason to come back often, new features can be tested and fixed quickly. Of course, this sort of system doesn’t really work for fundamental software such as operating systems, databases, browsers, identity providers, and directory services, where you want full-on security and regression testing, but it does work for the Web 2.0 applications that run on those bits of fundamental software.

And in keeping with the user-created tenets of Web 2.0, platforms such as Facebook that enable third-party developers to write applications to run on that platform also fulfill the function of continually adding features to the application without the owners needing to code anything, or pay people to add features. The users do it all for them — use the platform, add features to the platform, market their added features. The owners supply the hardware and the basic infrastructure (which needs to be stable and reliable) and the users do the rest. At least, that’s the theory and the hope.

Which brings us to the business models. How do people pay for the hardware, software, programmers, marketing? There are a number of ways in which Web 2.0 companies try to cover the bills for long enough to survive until they can be acquired by some bigger company. One is advertising. Google and its competitors have made it easy for even small web sites, such as bloggers in the long tail, to make some money from ads. It’s more than enough to pay the bills for some sites, since it’s now cheap or free to build and launch a site. Some sites are free when you watch the ads, but you can pay for an ad-free version. Or free for private use, but cost something for commercial use. And then there’s the variant where a basic account is free, but you have to pay if you want more features, such as uploading files, or uploading more than a certain number of photos. A variant for open source software is that the software is free, but you need to pay for support or real help in configuring it, or to get new releases more quickly.

One of a series on Web 2.0, taken from my talk at the CSW Summer School in July 2007. Here’s the series introduction. Coming up next: some issues with Web 2.0

Web 2.0: Technical

The technical component of Web 2.0 includes XML, Ajax, APP, various programming languages, plug-ins and widgets, and the REST architecture. All of these have a role to play in supporting the web sites that incorporate Web 2.0 features, while many predate the Web 2.0 phenomenon. There are far too many interesting technical features for me to talk about all of them in one post, of course, but this post should at least introduce you to some of the more interesting acronyms.

Technical Cloud Obligatory tag cloud: this one contains some technical terms

Developing Web 2.0 applications is easier than developing large enterprise-style applications. The developer toolkits are a lot easier to use, and it’s much faster to create something. 37 signals, who make Basecamp, amongst other tools, say they put it up in four months with 2.5 developers using Rails, a development framework. For developers there’s now a range of language options, from PHP to C++ or JavaEE, with newer platforms and languages like Ruby and Rails grabbing mindshare as well. People can program in the system they’re comfortable with, and although there’s a certain amount of snooty disparagement of each language from proponents of some other one, what matters in the end is using the right tool for the job. I’ve seen bad code written in Java and good code in PHP, and a system that does less but does it well is preferable to my mind to one that does a lot really badly.

Ajax (Wikipedia link) is another important Web 2.0 technology. It’s really a shorthand to describe a bunch of technologies (HTML, CSS, DOM, JavaScript) that are tied together, using the browser to create a richer environment by tying in scripting and a way to request information from the server without forcing the entire page to be reloaded. It’s powerful and interactive and can be much faster than other methods of adding interactivity to the web pages. There are lots of books on the subject, which is a reasonable indicator of the interest in it.

Since it combines a lot of different applications, debugging can be a problem. Some basic rules that I’ve found useful are: first make sure your HTML/XHTML validates, then make sure your CSS validates, then use Firefox with the Firebug extension to debug the rest. Once you have that working, you can make the changes for other browsers as appropriate.

Poorly written Ajax does have some problems, such as not being able to bookmark results, or the back button not going back to the right place. The big problem is the non-standardized XMLHttpRequest object in JavaScript, the object that lets your page talk to the server and get the right information. The way it works varies between different browsers and different versions of the same browser (IE 6 to IE 7, for example). Although W3C is starting to work on standardizing it, that will take some time. Another problem is the “A” in Ajax — it’s asynchronous, which means that internet latency can be an issue.

These problems can be solved — there are Ajax toolkits available which hide the XMLHttpRequest and other browser incompatibilities, some applications have figured out the back button and the bookmarking URL issues, the asynchronous issues can be dealt with by breaking the applications up into small segments which take into account the fact that the other end may never respond. And as a result of these toolkits and techniques, Ajax is now a major component of many websites, even those that aren’t for Web 2.0 startups.

REST is an architectural framework that explains a lot of why the web is so successful. Roy Fielding’s PhD thesis was the first place where it was codified (and he coined the term). Basically the idea is that everything that you can reach on the web should be a resource with a web address (URI) that you can reach with standard HTTP verbs, and that will have other URIs embedded in it. There’s more to REST, of course, and I’m sure the purists will take issue with my over-simplified description.

REST is widely used in what I call Ajax APIs — the APIs that various applications have that let people get access to the data. Mash-ups, where you take data from one service and combine it with another service, use these APIs all the time. The classic example of a mash-up was to take Craigslist rental data and mash it with Google mapping data onto a third web site (housingmaps) without Craiglist or Google being involved to start with. There are now vast numbers of mash-ups and lots of toolkits to help you create them. One problem with mash-ups is that the people providing the data may not care to have you take it (for example, if they run ads on their sites); the Web 2.0 solution to that is that if you own the data, you need to add more value to it that can’t be mashed as easily. Amazon has book reviews on top of the basic book data, for example, so people use Amazon as a reference link.

The concept of mash-ups goes further into platforms that support plug-ins and widgets. One of the appealing things about Facebook is the fact that application developers can write widgets to do various things (from the trivial to the heavy-weight) that use the information that Facebook provides (this has privacy implications, but more about that in a later post). In a sense, this is about sites (usually commercial sites) using the social aspect of Web 2.0 (user-created content) to provide more features to their users, and is tightly tied to the process implications of Web 2.0 (more about that in the next post).

The Atom Publishing Protocol is fairly recent. Atom is the cleaned-up version of RSS and gives you a feed of information, tagged with metadata such as author, published date, and title. There is now also a protocol to go with it, designed for editing and publishing web resources using HTTP. It can be used as a replacement for the various blog-based publishing APIs, which were used to allow people to post to their blogs from different editors, but it’s now obvious that it can be used to carry other information as well, and not just for blogs. Since it’s a REST-based API that uses basic HTTP, it can be used for more general client-server HTTP-based communication. A good overview is on the IBM developer site.

One of a series on Web 2.0, taken from my talk at the CSW Summer School in July 2007. Here’s the series introduction. Coming up next: process aspects of Web 2.0

Web 2.0: Social and Collaboration

The social and collaboration part of Web 2.0 mostly revolves around the concepts of social networking, user-generated content, and the long tail.

Social Cloud

Social networking is the idea that people can meet and talk and organise their social lives using the Web instead of, or in addition to, more traditional methods such as talking face to face, or on the phone. It’s an extension of usenet and bulletin boards that’s based on the web, with more features. Social networking sites tend to go through phases; everyone was into Orkut for a while, now it’s MySpace and Facebook, or Ravelry if you’re a knitter. Features and focus vary, but the idea of creating an online community remains the same.

User-generated content is the idea that non-professionals can contribute content. I don’t like the term much, so I’m going to use the variant user-created content to show that it’s a creative process, not just some machine generating content. The concept of user-created content isn’t new; the Web was first designed as a collaboration platform, the read/write web. In practical terms, however, it was difficult for those without lots of technical knowledge to publish on the web. All these things like blogging and commenting that are now relatively easy for people to do weren’t, just a few years ago. Previously only a few people could make their opinions widely known, in practice professionals with access. Don’t forget that one of the reasons Benjamin Franklin could make such a difference in the early years of the US was that he owned a printing press!

Now basically everyone with access to the internet who’s interested can publish their opinions, their photos, or their videos to their friends and the world. It’s easier to keep in touch with friends far away, or find out what life’s like in some far-off place, or contribute a snippet of knowledge to Wikipedia. Some of these publishers (bloggers, commenters, photo-uploaders) have a large audience, many have an audience that is large enough for them (which may mean just the family, or just themselves, or a few hundred strangers).

One of the downsides of this “democratization”, as it’s sometimes called, is that it can be hard to find the really good information or entertainment — you hear a lot about “cult of the amateur” and “90% of everything is crap”. Some of this is coming from those who are threatened by the availability of information from other sources: journalists and newspapers in particular are right to be scared, since they’re now going to have to work harder to convince the world that they add value. Whether the entertainment created by amateurs that’s available on the web is better than that created by the mass entertainment industry depends on your view of how good a job the latter does at finding and nurturing talent.

The long tail is another aspect of Web 2.0 that you hear about a lot. Booksellers are a good example of how the long tail works: Whereas your average bookseller, even Waterstones or Blackwell’s, has maybe a few thousand or a few tens of thousands of books, an internet seller can have millions. Although the comparison is perhaps not fair, since an internet bookseller, just like your local bookseller, can order from the publisher and will usually count that as being part of the inventory for bragging reasons. And, of course, you can always go to Powell’s Books in Portland, which claims to have over a million books physically in their store. It’s big; they hand out maps at the entrance so you don’t get lost.

The long-tail aspect is this: It turns out that most of the revenue doesn’t come from selling the Harry Potter books, big sellers though those are, it’s from selling those books that aren’t individually big sellers. The total volume of sales in those niche areas is larger than the best-sellers. Other companies that make good use of this of course are eBay, where you can buy things that you can’t get downtown, uptown, or potentially anywhere in your town, and the video rental company Netflix, which rents out some 35,000 titles in the one million videos it sends out each day.

And, of course, the long tail applies to blogs and other online sites. In other words, no matter how specialised your blog is, someone out there in blog-reading land is likely to find it interesting. The big problem is how those potential readers find out about it.

One of a series on Web 2.0, taken from my talk at the CSW Summer School in July 2007. Here’s the series introduction. Coming up next: technical aspects of Web 2.0