Jun 032009
 

Many people out­side the knit­ting world prob­ably don’t think about the fact that knit­ters have con­fer­ences too, where they register for classes taught by fam­ous people at some ven­ue. Recently a fam­ous knit­ter (Stephanie Pearl-McPhee, aka Yarnhar­lot) organ­ised such an event. I think she got some bad advice from her IT people, who­ever they were, about what would be required to run the online regis­tra­tion system.

To be fair, the IT people thought the organ­isers were being optim­ist­ic about how many people would show up. I’m going to sum­mar­ise the sali­ent num­bers; if you want more details, read the blog post. With 12000 on the mail­ing list, they figured 5000 people was the num­ber to expect, com­pet­ing for about 4000 spots. The organ­isers “built a huge serv­er and a pretty good sys­tem” for those expec­ted 5000 people. In the event, they had over 30,000 sim­ul­tan­eous con­nec­tions, and the serv­er could­n’t handle it.

It seems to me that these require­ments are pre­cisely what cloud com­put­ing should be able to handle. For this par­tic­u­lar event, it was pos­sible that only 1000 people would try to register at once, or that lots more would. The load could have been spread over a couple of months if the con­fer­ence seats sold slowly, or over an hour if they sold fast. Buy­ing a serv­er big enough to handle the max­im­um expec­ted in this actu­al case res­ul­ted in a serv­er and sys­tem that were too small; it could have also happened that money was wasted on some­thing that was far too power­ful for what was needed.

What I’d like to know is how, in gen­er­al terms, should such a sys­tem be archi­tec­ted? If you were using this as a case study on how to do cloud com­put­ing, what would you pro­pose? Some more require­ments: People can register for more than one class. Class sizes are lim­ited, and the size depends on the class. The sys­tem has to include an online pay­ment system.

I’m not look­ing for lots of details, just a broad-brush out­line of a para­graph or two, like “put X on one vir­tu­al serv­er that can scale up, and Y on anoth­er”. My per­son­al exper­i­ence so far of “the cloud” has been for stor­age rather than these sorts of sys­tems, and this use case has intrigued me.

  3 Responses to “Cloudy Ideas”

  1. We actu­ally have a sim­il­ar prob­lem to this year on year for the Mel­bourne Cup because we have a very poor idea of how many people are likely to vis­it the web­site on the actu­al day of the Mel­bourne Cup — not least because the horses are final­ised only a couple of days pri­or and if there’s inter­na­tion­al riders or horses we get a lot of interest from those nations and not just Australia.

    Last year we cre­ated a scale­able cloud solu­tion. Ini­tially we were going to go for purely just stor­age as this is the clas­sic mod­el based on alle­vi­at­ing just enough pres­sure off the web­serv­ers to serve web pages only and not heavy assets. In the end though we took a guess on front line serv­ers then had more in reserve that could be run up and dropped into the array very quickly based on a stand­ard image. It worked really well in the end and there was no per­form­ance degrad­a­tion on the array for the entire period. 

    After doing this now my approach is very much the following:

    Stick all your “heavy” assets on a stor­age cloud that self-scales eg S3 and use cloud­front if you need to get it closer to your end users.

    Where pos­sible cre­ate as few con­nec­tions into your data­base as pos­sible as this is an obvi­ous bot­tle­neck. Cre­ate a cluster if you need to. Gen­er­ally reads are going to be orders of mag­nitude high­er fre­quency than writes so you can scale out here. 

    If your pages are mostly stat­ic (ie man­aged through a CMS but aren’t real time data) then pub­lish them as flat files and get these onto com­mod­ity hard­ware using some­thing like EC2. Serv­ers in this instance can be thrown at the load issue so are hori­zont­ally scalable.

    All that’s left then is your applic­a­tions. These need to be optim­ised, using things like data cach­ing to enhance per­form­ance. Using the example above you only need to update the ses­sion avail­ab­il­ity data when someone com­mits to a pur­chase not when they view the avail­able seats thus you rebuild the avail­ab­il­ity cache only when it has been changed lead­ing to less DB requests and bet­ter per­form­ance of the applic­a­tion. Your app serv­ers can then become hori­zont­ally scal­able as well as they are being much more picky about the times they hit your DB.

    tak­ing a mod­el like this keeps people away from harm­ing the most fra­gile part of your ser­vice — the trans­ac­tion. If you can stop oth­er parts of your ser­vice caus­ing col­lat­er­al dam­age on this one area chances are you’ll have a suc­cess­ful out­come. A book­ing gate­way should eas­ily be able to book hun­dreds of people all at once — but only if it’s not being affected by the oth­er thou­sands of people on the site at one time…

  2. My stor­age pref­er­ence over S3 would be OpenSol­ar­is on EC2. You can set up a ZFS boot mir­ror (with Sol­ar­is, Amazon gives you your own box with two HDs), and as many ZFS pools as needed — adding or remov­ing EBS units to/from ZFS pools is a cinch.

    I have my own little OpenSol­ar­is dis­tro based on b107, but you can­’t run your own ker­nel on most cloud offer­ings. I’m try­ing Red­Plaid, which is VMware-based and allows any ker­nel that runs on VMware. I have an httpd zone and a mysql zone on the first instance. To scale, the mysql zone would be moved to its own instance, so the apps can then scale independently.

    Bet­ter yet, if a sys­tem is being built from scratch, would be to use Google Base instead of MySQL. That’s a ser­vice, not a cloud… So I guess my answer is “put A on cloud X, put B on cloud Y, and put C on ser­vice Z”. Amaz­ing, all the pos­sib­il­it­ies these days!

    Hav­ing just got­ten star­ted with “cloud” I’d like to point out anoth­er gotcha, besides the fixed-ker­nel issue. Scal­ing is mostly some­thing you do manu­ally via CP, or through an API with your own algorithm. EC2 just added “auto­scal­ing” which auto­mates the pro­cess, finally. Hope­fully EBS units will be able to auto­scale as well, in the future. But I’d rather see a prop­er REST API stand­ard­ized, so I can write a curl script to handle auto­scal­ing to my specs and have it work cross-cloud. 

    The point is, even if you’re on the cloud, gross under­es­tim­a­tion of traffic will lead to the same prob­lems the knit­ting con­fer­ence had. It’s just easi­er to react to that sort of dis­aster. I think that’s the most per­ni­ciously ove­rhyed aspect of cloud com­put­ing. Scal­ing up and down is easi­er, but auto­ma­tion needs to be con­figured prop­erly, if it’s even offered by the cloud pro­vider (or pos­sible with their API).

  3. I’d think that for a regis­tra­tion sys­tem, at least at the ini­tial bursty load, the amount of data­base writ­ing is going to be much great­er than typ­ic­al. So I’d use the biggest data­base I could afford. (The data­base folks have years of build­ing scal­able products.)

    That brings out a con­cern I have about clouds — where do you keep the data? For many organ­iz­a­tions, put­ting it in the cloud is fine. For many many more, it’s not. I’m sure that Expe­dia, etc., would like to use “pay as you need it” stor­age to some third party, but I’m also sure they want to keep their data to them­selves. Same for Priceline. And that’s not even get­ting into reg­u­lated indus­tries like fin­an­cial, health, some man­u­fac­tur­ing (air­planes), and so on.

    The place where the cloud can really help is by allow­ing you to bring up addi­tion­al front-ends. I don’t know of any cloud ser­vices that offer this right now, but cer­tainly there are com­mer­cial products often integ­rated with a com­pany’s J2EE serv­er (e.g., IBM’s Web­Sphere Vir­tu­al Enter­prise) that do this kind of thing. Gen­er­al­iz­ing that, hook­ing it into a load balancer/gateway, seems like a good thing for a cloud pro­vider to offer.

Leave a Reply to Andrew Fisher Cancel reply

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

(required)

(required)

/* ]]> */