(picture)

December 31, 2003

Happy New Year

Wishing you all a very happy and prosperous 2004.

Optimization

I have very little to say about optimization. Some, but no more than necessary (hah).

"We've tested razors with any number of blades," [Occam] says, ignoring the five-bladed machete hanging overhead.
Paradoxically, working on optimisation is an incredibly slow, and lengthy, process. It really is endless: every time you're tempted to say it's done, there is much more to come. After several days' work and some good results, my code is still JavaScript not C++, and it's still very dumb about talking to the SQL database, and so on. The corollary of this is: don't be disheartened when performance gets worse. Sometimes a bright idea simply doesn't work out.

In no particular order, a few things I learned:

  • N-squared and worse high-order behaviours tend to get noticed. My app had one, and of course it was just a stupid bug; I noticed when taking the code from a development machine with twenty shared spaces, onto a lab machine with 500.
  • Measure, if you can. If you don't think you can measure, then find a way. It's too hard to guess which pieces of a system are making it slow: time your operations; count them.
  • Focus on places where you're running slow processes many times. Don't worry about slow operations which only happen once; and don't spend much effort on fast operations which happen many times. In my bot code, one of the biggest gains was where I used to read and parse an XML configuration file from disk every time it was needed; now it's only parsed once for each "work cycle".
  • Look at every place you do any network traffic (HTTP, SOAP, database work, etc), and ask why it's being done right there. Are you doing it more than once? Can it be done once, upfront, or deferred until later? Same for on-disk files and databases. In Groove, opening a telespace is much more expensive than opening a telespace descriptor; opening a record is much more expensive than finding a recordID.
  • Make your code lazy. Don't do unnecessary work. Especially, don't do dumb unnecessary work. If you know that there has been no change, or that a query will return nothing, or that an update will only replace identical data: just don't do it.
  • Cache. Instead of constructing an object (slowly) every time you need it, make it once, put it in a memory cache, and fetch from there when needed. Cacheing can be dangerous, though, unless you have a well-defined lifetime (in my bot, many things could live for a whole work cycle, at the end of which all the caches are flushed).

Finally, specialize. Where class methods have lots of code like if(this.m_Flags & FlagOne){...something...} else {...} and the flags are only set in your constructor, why not make some specialized classes to handle the different activities? Then the test only happens once. This sort of specialization actually saves so little processor time as to be completely useless, but it does make me feel good.

December 30, 2003

Scale and Performance

Although there's a good argument that software performance matters per se, and other arguments that (in the long view) code performance really isn't worth worrying about... recently I've been working on some speedup because it will help my applications scale. Performance is one of several gating factors (processor, memory, disk, network...) in the server-based application I've spent a lot of time with recently.

Server-based? Yes, why. A Groove Server, natch.

The Groove EIS, which is all the Groove peer-synchronization infrastructure pieces (comms, storage, crypto, awareness, dynamics, etc) with minimal UI and some quite aggressive "passivation". The current release of EIS (2.5i) has some fairly low limits on the number of active shared spaces, and several people (the EIS development team, QA, a few very demanding customers, and some folks like myself kicking in from the sidelines) have worked hard to eliminate the big static limitations for the 2.5j release. So far, we're very pleased with the results. The process has shown me a few things about scale, optimisation and performance. I'm still not really sure how to quantify things, though.

Scale

Several things prevent an application from scaling. In the case of EIS, the first barrier has been memory: we've seen EIS hit Windows' 2GB address-space limit (with around a thousand simultaneously-open Groove spaces), and it's not pretty: there just ain't no more memory. Adding RAM ($500 for a couple gig extra seems cheap enough)? Makes no difference at all. 32-bit apps simply don't do that. It's possible to tweak the OS some, but the only way to escape the 32-bit address limits would appear to be some major esoteric low-level reconstruction, or a complete move to a 64-bit environment - and I'm not expecting that to happen overnight. Meanwhile, this is a wake-up call: even my work laptop is close to being "memmed-out". (Oh no! Another few years of thunks!)

Fortunately, memory is only a problem when you use lots at once, and we've learned not to do that with our "bots".

After memory, CPU. My code (shuffling data between Groove shared space tools and an Oracle or SQLServer database) was eating processor cycles, even on a chunky Xeon box. While this wasn't an immediate showstopper, it did limit the number of shared spaces we could synchronize in a few-hour daily window (to low-thousands), and that in turn made it difficult to schedule various different things to happen at the right time. The integration code in this case is JavaScript, but I still scraped somewhere above a fivefold performance gain in the test lab, and nailed at least one O(N^2) problem. (Which was to be the topic of this entry here. The gory details are interesting, I promise, but they'll wait).

The kicker was to put the new code into production, and see... zip. Nada. Approximately zero (well, maybe 25%). Turns out, the lab environment has a gigabit network to a very-lightly-loaded and massively overspecified SQLServer. The production network to the production database is a little different, and all the code in the world won't make it much faster.

Is it worth it?

The apparently-trivial performance gain does mean a very significant gain in scalability. After all, the application was CPU-bound; now it's externality-bound, and some simple expedients (careful indexing, for example) can make a big difference on the database side.

Since we need several distinct servers for this customer's environment, there's also a case for considering a virtual machine architecture. VMWare ESX for example, which I've also been tinkering with: fascinating. That network-bound app? Just run two or three virtual servers on the one piece of hardware. The 32-bit address space limit on a box with slots for 16GB? Run multiple machines on one.

Quantifying

Quantifying (setting guidelines for scalability) suddenly got a lot harder as a result of all this work. Previously, we could safely say: you'll have problems running more than 1500 shared spaces on a single device. Now, it's a multivariate problem. How much workload? What sorts of activity in those spaces? Are the users on the same LAN? What external systems are you talking to? What's the lifetime of your spaces?

Thse are interesting questions, though.

December 27, 2003

AoUP

esr's The Art of Unix Programming has much good stuff. I want to write a little about optimisation later, having spent most of the last couple work weeks on performance work.

The chapter on complexity is good, but begins to annoy me with its Unix, Unix, Unix. Sure, there's a uniquely-Unix culture and history, and many idioms which spring from that. But jazz doesn't have a monopoly on syncopation; punk doesn't own the three-chord refrain; opera isn't the only place to hear a story; and Unix certainly doesn't guarantee, nor have exclusive rights to, elegance in the software arts.

The section on minilanguages is nice, and relevant to my day job recently. I'm reminded that minilanguages (or something from the same well) are perhaps the major source of Microsoft's application dominance today. When the Office programming model became COM, this turned the idea of app-specific minilanguages inside out; the results were, and still are, a uniquely flexible aspect of Windows.

December 26, 2003

Year-end approaching

Time for the quizzical retrospectives, I suppose. This from AlterNet is a good one.

CasualSpace

John Perry Barlow:

People rarely think of phone calls as being so casually cheap that one would simply leave the connection open for ambient telepresence and occasional conversation. To create shared spaces that span the planet, and to do so whenever you feel like it, and to leave them unpurposefully in place for hours, is not something people have done very often before.

The next step is to make those shared spaces larger, so that multiple people can inhabit the same auditory zone, entering and leaving it as though it were a coffee house. This will change the way people live.

This isn't just about free international telephony, either. Shared spaces gain some of their identity and ambience from their furnishings.

Tear down the walls

Groove.net Weblog:"Our purpose is not to divide, but to connect".

December 23, 2003

First Computers

Patrick, have you been contacted by this man?

Talking about first computers: I learned some BASIC on the Imperial College mainframe (some IBM thing, I think); many many hours of teletype access to the ICL and Prime machines at Guildford (I was SCH008); and, around the same time, a RML 380Z, ZX-80, and an ageing Mael 4000 (an interesting machine, with two 8" floppy disks, tape drive, golfball printer, assembler control panel, and 4K core memory).

December 22, 2003

Delicious

http://del.icio.us/ "is a social bookmarks manager". Small, simple, cute: "unplanned taxonomy space", too. My linklog.

Royal icing

It says here:

  • 4 egg whites and some lemon juice
  • 2lbs icing sugar
. Two pounds of sugar for a 9-inch cake? Even for two seven-inch cakes, that sounds like a lot. So there's only about a pound and a half on these.

A cake
with almond paste
and icing.

OK, OK. You want neat icing, hire a plasterer :-)

December 18, 2003

Stack languages

Ned writes recently about PostScript's stack-based language model (like Forth, too). I was fascinated by PostScript a long while back, when we had a LaserWriter in the Dublin office; it's a great little language, but "difficult".

In a related vein, here's a sanitised version of a JavaScript stack calculator language I've been working on recently. This is deliberately not a real programming language; it's an extension for the configuration file syntax I talked about a while back. Funny how these things grow, huh? Anyway, the principles are simple enough to make readable code, and yet nicely powerful. The syntax will be familiar enough: @functions. Of course the Notes @function compute engine uses a full-scale parse tree these days, not just a teeny little stack.

To be a proper language, what more would I need? Just @def. (Oh, and maybe some datatypes. Or just one).

December 13, 2003

Groove weblog

Groove company weblog, written by Richard Eckel. Let's hope this will be as good a read as the internalgroove.net weblog he also edits.

December 12, 2003

This year's Christmas cake recipe

I don't often bake a Christmas cake - I think this is only the second time. The cake seems to have cooked fine, and it's sitting in a tin waiting to be marzipanned and iced; only after putting the cake in the oven did I realise my first mistake, and only writing this I noticed the second, seriously major, problem. More of which later.

Perhaps being stranded in Massachusetts snow has made me hanker after the traditional trimmings of the season: cake, Christmas pudding, mince pies (all of which are slight variations on the same recipe!). Also it seems that fruitcakes are slightly frowned upon here, for some reason... let's see whether this one, if edible at all, has enough alcohol to keep the recipients merry.

So, the recipe. No, not this one, although it's not far different. If you're actually planning to make this you should have started a while ago - just after Thanksgiving, say (or in Britain, when you see the Xmas decorations in shop windows. Oh wait, maybe September is too early).

The original recipe I have is from Dublin's Evening Press: "Northside Shopping - Christmas '92", by Eileen Davis. No doubt it was handed down for generations by tradition, and it's been in our fading cuttings folder for years; now you can pass it around by URL. So much more convenient, right?

Some adjustments for American ingredients; I couldn't find sultanas (substituted dried cranberries), nor candied peel (substituted some mixed fruit plus a finely chopped lemon peel). If you know where I can find organic marzipan without paying a fortune, please let me know!


  • 8oz raisins; 8oz sultanas; 8oz currents; 8oz peel; 2oz cherries
  • half teaspoon mixed spice (I just used nutmeg, cinnamon)
  • 2oz ground almonds; 2oz whole almonds
  • whiskey
"Mix the fruit and pour in half a glass of whiskey", it says. Um. Some Glenmorangie, some brandy, some Cointreau -- I'm not sure how much altogether, but plenty. Throw in a cinnamon stick and some spice. Then I left this to steep in an airtight bowl for a week, stirring occasionally. By last weekend all the alcohol had been soaked up, and it smelled good. Then add the almonds and soak a little longer.

  • 8oz sugar (white or brown)
  • 8oz butter
  • 6 eggs
  • 12oz flour
  • quarter teaspoon baking powder
Beat the sugar and butter together until creamy. Right. That seems so improbable as you start, but it really happens: eventually, after much beating, it really does get light and even fluffy. Do keep going; this is the important bit. I'd spent so long thinking about the alcohol content, the actual baking was done in too much of a hurry. Only skimp on the beating, as I belatedly remembered from the last attempt, if you don't mind a cake in which all the fruit sinks to the bottom and the cakeyness floats folornly on top.

Beat the eggs separately, then add to the creamed butter and sugar. Sieve the flour with the baking powder, and fold into the mixture. Finally stir in the mixed fruit (but not the cinnamon stick).

Line a 9 inch tin with greaseproof paper rising about 3 inches above the rim. (Some pictures will help!) I used two 7-inch round tins instead, which is about the same total volume, but are slightly more manageable and probably cook faster. If you want to make one 7-inch cake, just halve the quantities.

Gently fold the mixture into the tin. Smooth out the top with the back of a spoon, making a slight dip in the centre so it'll have a level top after cooking.

Bake in a slow pre-heated oven at 140 degrees or Gas Mark 1 for at least five hours. Don't open the oven door at all for at least the first three hours, then cover the top of the cake with a double layer of parchment to keep it from burning. After four or five hours, check the cake with a skewer to see if it comes out dry. And it did - looking just fine. But wait: the recipe does say 140 degrees Celcius, where I cooked my cake at 150 Fahrenheit (which is probably, like, tepid bathwater temperature: Gas mark zero-and-a-bit). 140C=285F! Oh no!


Next: put the cake in a tin; it should improve with keeping. Or, in my case, I should put it back in the oven for a while. Finally, a week or so before Christmas, cover with marzipan then with royal icing. Yum!


(Update: I cooked it more, and seems OK -- too early to tell. Here some pictures: Here some pictures, for google-visitors!.)

December 08, 2003

Swarmbusiness

FT:

Swarming, a technique pioneered by the US army, is emerging as a peer-to-peer (P2P) networking technique in the civilian world, helping organisations reduce the time needed to react to new business opportunities.
(PDF)

December 04, 2003

Sales teams

Several years ago, Michele tried to persuade me into sales. Sales teams are a bit different from development teams:

The art of the cold call is one of the most respected in sales, it takes big brass ones and thick skin to be able to get on a phone and call a list of people who don't know you or care about you. People who think they are too good to cold call just aren't good enough to be in the same room with people who do.

December 02, 2003

Craftsmanship

Joel Spolsky on software development:

It comes down to an attribute of software that most people think of as craftsmanship. When software is built by a true craftsman, all the screws line up. When you do something rare, the application behaves intelligently. More effort went into getting rare cases exactly right than getting the main code working. Even if it took an extra 500% effort to handle 1% of the cases.

Craftsmanship is, of course, incredibly expensive. The only way you can afford it is when you are developing software for a mass audience. Sorry, but internal HR applications developed at insurance companies are never going to reach this level of craftsmanship because there simply aren't enough users to spread the extra cost out. For a shrinkwrapped software company, though, this level of craftsmanship is precisely what delights users and provides longstanding competitive advantage.

He's dead right, but I don't completely agree. Commercial ("shrinkwrap", if that's an appropriate term) software development happens under a set of constraints which make pure attention to detail very difficult; that's also true of "bespoke" software, it just works differently. There's intense pressure to ship to deadlines; the feature set is decided with a different set of priorities than developers or users would choose; that 500% effort usually won't be cost effective. The software industry is driven by novelty (and the elephant under the table, Moore's law).

Craftsmanship is perhaps a state of mind, unenforceable. But I do know how to encourage it: practice.

In the crafts, the creation of this bowl could only be possible with years of practice, experimentation, concentration, and an understanding of the product's behaviour in context (how it looks, how it feels, how it works). While the basic materials and techniques might develop in leaps and bounds, refinement takes a while.

Software is the same. Not "measure twice, cut once" but a constant cycle of build, tear down, rebuild, refine. Do it again and again. Show users your prototypes and failures, if you can. Learn from experience.