« March 2005 | Main | May 2005 »

April 30, 2005

File under 'R' for redundant

I love how over the past few years, there's been a lot activity in the distributed filesystem arena. Corporations are realizing how expensive it is to recreate data versus just keeping it indefinitely in the first place. And storage requirements keep growing higher and higher.

There are a few distributed filesystem implementations that I like, namely AFS, GFS (Google's implementation, not the original GFS), and Xsan. Each is designed for very different usage patterns, but all are worth a look for different reasons.

First, a quick recap of this trio. All three let you build SANs (storage area networks), but with varying degrees of coupling between your storage hosts. Additionally, AFS will let you create SWANs, which are like SANs, but connected over a WAN. From reading the GFS paper, it can theoretically support SWANs, but I don't know if Google is employing it in this fashion.

AFS

AFS is the Andrew File System, which was developed at Carnegie Mellon University, and was subsequently worked on by Transarc and IBM. Somewhere along the way, Andrew was dropped from the name and AFS now just means AFS. OpenAFS is the branch
most commonly used. AFS uses a cell-based architecture, where each cell corresponds to a geographic cluster of storage hosts. For example, if your company has three offices in New York, Paris, and Tokyo, each office would constitute a single cell.

AFS Volumes

Each cell subscribes to a set of volumes, where each volume holds a category of files and enforces its space allocation, replication counts, user ACLs, etc. The physical data for the volume is only stored at the cell where it is located, but it is possible to do read-only replication of the volume data to other cells for redundancy purposes.

Let's pretend I have three cells, /afs/home, /afs/work, and /afs/hawaii, corresponding to three locations I have AFS installed at, and that I have these volumes:

By default, each volume is only stored on the cell where it is defined. The physical data for /afs/hawaii/photos only resides in my hut in Oahu. Other cells (home, work) can grab images from my hut by using the full cell path (/afs/hawaii/photos/hut.jpg), but they don't have a local copy of the data in my photos volume.

When a cell wants to edit a file located in another cell's volume, it grabs a copy of the file from the remote cell and stores a cached copy locally. All file operations happen on the local copy, until the file is closed. Once the file is closed, the updates are sent to the remote cell, and the new changes show up globally, since remote cells are grabbing a newly cached copy of the file whenever they read/write to it.

AFS Replication

Now, it doesn't always make sense to always grab files from AFS volumes remotely all the time. What if your files are heavy? What if you want to access more data than what can be streamed over the network between your cells? AFS addresses that with read-only replication.

The basic idea is that volumes will always have a master cell where they are stored (/afs/hawaii/photos), but that other cells can be configured to keep read-only local copies of those volumes. This places a burden on the master cell to determine when the replicas should be updated.

For example, let's say I have /afs/home/dvds, which holds a couple terabytes of ripped dvds (which I legally own, thank you very much). If I want to watch a film from Hawaii and don't have replication enabled, I have to depend on the network link between Seattle and Oahu to be reliable enough to stream my VOB file. That dog won't hunt, monsignor.

Now, if I setup my hawaii cell to have a read-only replica of /afs/home/dvds, then I can watch my dvds in sunny Hawaii and not worry about network latency. Whenever I rip a new dvd at home, I can issue a volume update and make the remote read-only replicas update their copies of my dvd volume. As you can imagine, AFS replication works best when you have data that isn't constantly being updated.

Commercial AFS Installations

AFS should be more famous than it is, it's the leader when it comes to distributed storage across WANs. A former colleague of mine used to work as a system engineer for one of the large brokerage firms on Wall Street, and he explained to me how AFS was great for their worldwide storage needs, since it even worked with their branch offices in SE Asia that didn't have the same bandwidth as their North American and European offices. The slower offices just subscribed to a smaller number of volumes that were vital, and picked up updates as they came along. The company had dozens and dozens of datacenters around the world with terabytes and terabytes of storage, and it all just worked (TM).

It's hard to find a public list of large companies that use AFS, probably because they see it as a secret weapon of sorts. It's not perfect; you have to construct your volumes carefully in order to manage update frequencies, and you'll probably need a dedicated AFS admin for large installs, but it's used by all sorts of industry leaders. The key is to go to AFS conferences and look at everone's nametags. You'll see employees of major financial institutions, auto manufacturers, e-commerce companies, retailers, government contractors, etc, etc.

AFS Downsides

AFS isn't perfect. It depends on Kerberos for user authentication, which is a headache all of its own. You have to carefully manage your volumes, especially when replication comes into play. While you can resize the quota of different volumes easily, there is still a bit of a tightrope act required in order to balance data across cells and volumes properly. (There's even a mailing list dedicated to balancing AFS.)

AFS is best suited for storing large filesets that don't have high update frequencies. It's great for distributing binaries across organizations, storing and replicating videos, mp3s, etc. If you have unflexible replication requirements (i.e. changes have to show up in read-only volume copies immediately after being commited to the master read/write volume), then you can't support high volume change velocities. That, or you're stuck with creating tons of very small volumes in order to gate your updates per-volume per-timeslice, which increases the admin overhead of your AFS deployment.

All that said, I rather like AFS, and have been considering deploying it for the home storage network I've been working on. I've been ripping all of my media to disk, and AFS complements rips nicely. They aren't changed very often, but they are heavy and I want to make sure that my media is replicated offsite. This is where read-only replication works wonders.

GFS (Google File System)

GFS is Google's distributed file system (read the whitepaper here). Its design is Google-centric (naturally), so it assumes that it will run on commodity hardware with non-RAID drives. It optimizes for bulk reads/writes over random reads/writes, and it relaxes some of concurrency requirements of normal distributed file systems in favor of letting application logic detect anomalies.

GFS isn't a traditional file system, though. GFS clients use a custom library to read/write files, and GFS doesn't use the normal kernel-level vnode hooks that other file systems use. The server side also runs completely in userland.

Basic GFS Design

GFS breaks files into 64MB chunks. Chunks are stored on hosts called chunk servers. Each chunk is replicated onto three chunk servers, which optimally are in different physical racks. Chunks are placed in servers which have the most free space available, and over time, their placement is fairly randomized.

Here is a diagram of a GFS cluster, taken from Google's whitepaper on GFS that I linked to above:

A master server stores a directory of which chunks are available and which chunks reside on each chunk server. The client API keeps track of a cluster's chunk size, and translates the read/write requests of the user into chunk offsets for a given file.

GFS Read Example

For example, let's say I grabbed a 150MB apache log and stored it in my GFS cluster as httpd.log two weeks ago, and now I want to read it back and build a unique list of IPs that hit my web server...

I would ask the GFS client API to read the file httpd.log, starting at byte offset 0. Since that is within the first 64MB of the file, it would ask the master server for a list of chunk locations for httpd.log's first 64MB chunk. The master server then returns a global chunk id for that file/chunk offset combo, plus a list of chunk servers that hold the chunk mapped by that chunk id. The client library looks at the list of chunk servers and picks the closest one, grabbing the chunk data and sending the bytes back to my read call.

These steps are repeated for chunks 2 and 3 of httpd.log (64-128MB, 128-150MB). The third chunk isn't 64MB, but only 22MB. Chunks are allocated as their size fills up, so it's easy for the client library to know it's reached the end of the file.

The chunks are streamed to the client library in proper chunk order, and the user doesn't have to know what's happening under the covers. The large chunk size means that read bandwidth stays pretty high.

GFS also has some optimizations; multiple chunk offsets are requested from the master server at one time to cut down on back-and-forth traffic between the client and the master, and chunk/chunk server metadata (but not actual the actual payload) is cached on the client side.

The key to the system is that while the master server holds the directory service that maps abstract files to chunk id lists and the servers that hold the chunks, the actual streaming of the data for a chunk is a direct communication between the client and one of the three chunk servers that holds the chunk.

GFS Writes

Similar to how the client talks to chunk servers directly for reads, it also talks to them directly for writes. If you're writing to a file, the master server is queried for the chunks that will need to be mutated. For each chunk, the master server picks one of the three chunk servers that hold a replica of that chunk, and makes it the master replica server.

All the mutations for that chunk are applied on the master replica server, and then it turns around and tells the other two chunk servers to update their replicas of the chunk to match its new copy. Once all three chunk servers have confirmed that they have written the new chunk to stable storage, the master replica returns a success code to the client, which moves to write the next chunk.

GFS also has optimizations that make it possible to write several chunks at once, and to have multiple simultaneous writers on a chunk (as long as they are appending and not writing at random offsets in the chunk, which isn't a common case for writes anyways). The interesting thing is that while GFS guarantees that multiple writes can occur atomically, it doesn't guarantee that a given write happens only once. This means that the records appended to chunks have to contain some sort of header or sequence id inside them, otherwise the application that's reading the data might accidentally process a record multiple times.

The nice thing is that consumers of the client library aren't issuing low-level chunk read/write calls, they're just using the api to write to a file.

Interesting GFS Architecture Items

GFS has other features, but you should really just read the whitepaper to get a better idea for what it can do. Because Google wrote it specifically to address their internal needs for a filesystem, it is pretty specialized to Google's business domain.

For example, it uses a non-standard client library for access to files, and is much more focused on streaming high-bandwidth reads and writes versus supporting random file access. Concurrency on writes is supported, but you don't get the guarantee of the write happening only one time. Latency isn't emphasized much, but bandwidth sure is.

One interesting thing is that Google was seeing i/o corruption with early deployments of GFS, so they built-in a checksum system that sits at the chunk server level (not the master server level). Chunks are checksummed into 64KB blocks, so each chunk has 1024 checksum entries. The checksums are checked on both reads and writes, and stored in memory on the chunk servers. A common theme for GFS is aggressive storage of metadata in memory, with checkpoint flushing to disk for critical things.

I also like that chunk servers are the authority for which chunks they hold, the master server doesn't have a persistent authoritative store of which chunks are in which servers. Instead, when a chunk server starts up, it contacts the master server and tells it which chunk ids it can serve up. Once a chunk server is running, it also periodically sends a heartbeat back to the master server, making sure it has an up-to-date list of all the chunks it holds. Google realized that it would have been more complex to make the central server be authoritative, and who better to know what a chunk server holds than the chunk server itself? I like that.

One small thing I'm not a fan of is calling the central directory server the master server, it think it can be confusing at times. I don't know what I would call it, maybe something like chunk directory or chunk router. Heh, or maybe... Chunk matchmaker. :-)

If you read the GFS paper, I highly recommend reading these sections closely:

And sections 6 & 7 (benchmarks and real-world GFS experiences) are a fun read too. GFS fills its niche pretty well!

Xsan

Xsan is Apple's entry into the SAN market. It runs on top of Xserve RAID boxes and supports storage clusters of up to 16TB. Xsan uses fibre channel for its fabric, so it's tuned for short-range, high-throughput storage. Xsan's technology overview paper goes into good detail on how it works.

Xsan is an unapologetic, super-high bandwidth storage solution. Here are just a few features it has that are uncommon for a NAS/SAN system:

And you get a host of gui tools to help you configure and manage your differrent volumes, setup access groups, measure SAN throughput, etc, etc. Oh, if only I had a budget for HPC clusters, I would get a rack of Xserves and a few Xserve RAIDs, and put Xsan to use. When you combine the throughput of Xsan on Xserve RAID's already beefy hardware, and then attach machines with wide and deep buses like the Xserves, well, it seems like you're getting as close as you can to supercomputer territory without paying a few million bucks. I keep hearing good things about Apple's server market, and I hope they keep fighting this fight, their server solutions are excellent and relatively affordable when you compare them to the competition. Casual PowerMac owners might freak out when hearing what these solutions cost, but people who work with storage and servers for a living realize that for what you get, it's a bargain.

Why talk about these filesystems?

You might be wondering why I spent all this time discussing this trio of filesystems. Well, I talked about AFS because I've been thinking about using it for helping me store and manage all of my media and email archives. Offsite replication is pretty straightforward, and I get POSIX-y access to my files without having to change much application logic. It's mature, stable, and is a much better alternative than NFS for what I want to do with it.

As far as GFS goes, well, I guess I'm just a fan of GFS's principles, I like how Google was unapologetic about its design and tweaked it as much as possible for their problem domain. It's not a general purpose distributed filesystem for tech companies, but it doesn't have to be. That said, it has a lot of interesting ideas, and I like how its design isn't overly complex with lots of distributed locks and transaction managers.

And last but not least, I like Xsan because Xserve RAID is damn sexy, and Xsan exploits its performance well. GFS/AFS use ethernet, but Xsan uses fibre channel links (for smokin' speed) and has good concurrency AND throughput. I don't have the money to afford Xserve RAID, but if I did, I would use Xsan for my home storage needs. Xserve RAID + Xsan is relatively cheap compared to the SAN solutions offered by other companies in the storage market. You wouldn't believe how much NetApp and friends charge for their storage cabinets...

I should have my storage cluster setup six months from now, I'm going to have a heavy node at home that stores all my media, and have an offsite node that stores vital things like email, config files, source code, etc. It will be interesting to see where my storage is at a few years down the road once HD dvds are more common. I'm thinking that 5TB won't seem like a lot by then...

Posted by djb at 01:25 AM | Comments (0)

April 27, 2005

Feel the 1U love...

So, I'm in the market for a 1U server. My only real choice is a Xserve G5 or an Opteron-based solution. Intel doesn't factor in for me, they might hold the bulk of the CPU market, but as Microsoft goes to show you, being the market leader doesn't necessarily mean you have the best product in the market. I'm not a huge fan of the Intel server offerings out there.

Tyan makes a cool dual-socket 1U Opteron case which I can find online for ~$900. It's certainly not as sexy as a Xserve (and doesn't have all the neat monitoring/admin tools bundled), but it has four sata trays, which would let me hold 1.6TB in 1U of height. The cooling on it looks adequate enough to run four drives.

I'm pretty excited about the new Opteron dual-core cpus that are starting to trickle out. The 270 seems to be perfect for what I'm looking for, a dual-socket box would give 4 x 2.0GHz. The prices are pretty high on the new Opterons though, it looks like I'd be paying $1k per cpu for the 270, which is the dual-core 2.0GHz model. Hopefully the price would go down over the next few months, but who knows. I'd certainly take 4 x 1.8GHz versus 2 x 2.6GHz (2x265 vs 2x252, they're roughly $850 per socket now).

That's $2600 for a 4 x 1.8 Opteron 1U case, and then after factoring in 4x400GB sata drives ($1000) and N GB of 1GB DDR400 ECC ram ($150 per gig), you come out at $4200 for a 4GB system, and $4800 for a 8GB system. With 7.2 GHz of cpu and 1.6TB of raw storage. Oh, and tack on another $125 for an internal 250GB operating system drive. So basically, $5k.

And I can dial that back by putting in less drives up front, and getting less ram. $2600 for the base 1U case and the four cpus is pretty good, though.

On the XServe front, things are a little bit more expensive. $4k for the base 2x2.3 model, and then I have to put in ram and drives at the same cost for the Opteron solution ($250 per 400GB of sata drive with a three drive limit, $125 per 1GB of ECC ram). The Xserve has better monitoring and system information (there are multiple temperature and fan sensors you can monitor), and many more options for multimedia programming. It's easier to get support for, and it's pretty much plug-and-play with Xserve raid. I'm also guaranteed that things like the onboard gigabit ethernet will just work without any tweaking or driver machinations, and that the system in general will be pretty rock solid stable. From what I've learned, the amd64 variants of Linux and FreeBSD are still shaking the bugs out and aren't quite as stable as their i386 cousins.

So in the short term, I'll probably end up purchasing a Xserve and explore using it for video and image transcoding, and then will eventually purchase a dual-core Opteron for doing low-level http cache work, or perhaps put it into service as a database host running PostgreSQL.

The nice thing about 1U is that it's not just a standard for racking computer hardware, it's a standard for audio hardware as well. And there are tons of relatively cheap 8U and 12U gig cases with hardened shells that would make a great portable rack. They have latches on the front and back, so you can lock up the whole shebang and put it in your car and go somewhere, and then take the front and back panels off the case when you've plugged it in. Throw in a 1U power conditioner and a 1U gigabit switch, and you have a human-portable micro datacenter.

That's the appeal of 1U servers to me, I can fit 25-50GHz in a large suitcase and take it anywhere (well, it would weigh 35 lbs x the number of servers, so maybe I'd need something with wheels and a handle, but you get the idea).

I'll leave the business case for having that much power in a small portable case versus in a rack in a HVAC-enabled datacenter as an exercise for the reader...

Posted by djb at 02:38 PM | Comments (0)

April 26, 2005

Java games and field trips

I've been taking CSC 143, which is the second Java course in the CS foundation courses required for transfer to UW. It is pretty fun, our first two assignments covered writing Tetris, mostly from scratch. We were given some scaffolding and swing stubs for drawing the squares and grid, and then we had to implement the shapes, the row removal and the rotation.

It was pretty enlightening, since the rules of the assignment weren't exactly how Tetris itself played back in the day. I must have eliminated a thousand rows while play testing the solution, and after a while, my brain adjusted and it was as if Tetris had always behaved that way. Two of the big differences were the initial starting orientations of the pieces when they dropped through the top of the screen, and the logic requirements for determining when it was safe to rotate a piece.

I'm glad there was some differences, though, since it made the project more challenging. The newest one is a historical stock graph that shows daily prices (with high/low/closing prices marked) and quarterly earnings and their trends over time. This is pretty similar to the Cocoa code I wrote for graphing audio samples, except there are never any negative values. I've written the model and controller classes and have implemented most of the text view, so now I have a week to write the graphical view that does the pretty graphs. Here is an example of what we're shooting for, although our graphs won't be quite as complex.

This is the third of eight projects for the quarter, so I'm excited to see what's coming down the road. Last quarter, we did an adjustable LED clock, implemented a PhotoShop-like app with convolution filters for different image manipulations, and created a pinball game, among other things.

The image app was my favorite, which probably isn't a surprise to people who know me, since I used to work on image software quite a bit.

I've been pretty busy with school and interviews, I've got an interesting possibility in the pipeline right now, but I'm not going to talk about it until I know it's a sure thing. It's been a long time since I was on the receiving side of an interview, so that was a real eye opener. I went to the Bay Area last week, and on the flight back, I saw several people I knew from Amazon. A couple of DBAs who had since left the company and were down there for a MySQL conference, and two people from the tech leadership of A9.com, Amazon's search spinoff that's located near Palo Alto. And a couple other people on the flight looked familiar, but I wasn't sure if I knew them or not. Turns out the Thursday night flight to Seattle is always packed, since a lot of people are flying home for the weekend. There wasn't an empty seat on the plane.

It actually ended up being cooler in the Bay Area than it was up in Seattle, it's been roasting up here lately. Summer in Seattle can be hit or miss. If it gets really hot, people aren't really prepared mentally for it, and they start doing crazy things. It's the only time of year that Seattle drivers forget their normal, sedate, friendly driving routine and start driving like the rest of the world.

It has yet to be determined whether this is good or bad.

Posted by djb at 12:47 PM | Comments (0)

April 15, 2005

kqueue and libevent

I've been spending a lot of time this past week looking at kqueue and libevent examples out there, specifically benchmarking my G5 to see what kind of http performance I can expect from it.

A simple non-keepalive select based daemon is serving 9k requests per second (RPS) per cpu. I've been hacking on PLB (pure load balancer, a free software load balancer) and have made some changes so I can keep the core libevent code for listening on sockets and watching for read/write conditions, but I tore out the proxy code so I could make it behave like a plain httpd. I'm still tuning and profiling it, but I'm getting obscene request rates with that codebase. I think that once I'm finished, I'll be able to do 30k-50k RPS per cpu. I've found some excellent papers on analyzing system call bottlenecks, so it's possible I could go even higher than my projections, but I think that 30k is a safe bet.

PLB uses libevent, which is a cross-platform library that finds the best event notification library on your OS (be it epoll, kqueue, or /dev/poll), and wraps it into a nice api. If you use libevent, you can create server code that supports tens of thousands of concurrent connections, without taking a huge hit. This logarithmic graph shows how well it scales.

BSD-based OS's use kqueue and kevent, which originally came from FreeBSD. I like the kqueue framework because it allows you to setup observers on all sorts of system state changes. For example, I can open a fd to a critical config file on my system, and if the file is copied, unlinked, renamed, written to, extended, or hard linked to, I am notified as soon as these changes occur and can take appropriate action. It doesn't block the actions from happening, but it gives me a cheap and guaranteed way to know they did. If I was writing a deployment nanny, I could register all the config and binary files for my deployment (which won't be more than my per-process fd limit) and alarm if any of the files are changed while the process is running. How do I know it's still running? Because there's a kqueue watcher for processes as well, and it follows them across fork or exec calls. :-)

Basically, kqueue lets you take a random fd (a network socket, a file on disk, a pipe, or a fifo) and register observers so that you are notified when it's ready for reading or writing. It also has observers for vnodes (the vm objects that model files inside filesystems. they're at a higher level than inodes), and signals. Darwin doesn't have support for two FreeBSD kqueue features, which are async I/O notifications (it doesn't have async I/O at all), and generic timer notifications.

Anyways, kqueue has beefier features than linux's epoll, and lets you do some pretty freaky stuff. All the BSD cousins have sysctl, so you can bump your per-process fd limits pretty high, darwin goes all the way up to 64K.

I've also been reading up on Tux, which is Linux's replacement for the first generation khttpd, a http daemon that ran entirely in the kernel. Tux is pretty darn fast, and will end up being faster than userland httpds, even if they use low-cost event frameworks like epoll or kqueue, simply because it doesn't have to context switch. Tux is an upper bound on performance with Linux httpds, so it's a good goal to shoot for.

I think a kqueue-enabled BSD httpd could get pretty close to Tux's speeds, and I think a kernel-level FreeBSD httpd could even beat Tux's speeds. I'm still learning about the newer FreeBSD features like netgraph, zero-copy sockets and zero-copy apis, but it sounds like I could get some pretty high rates if I employed those with this code, but running on a FreeBSD box.

Hmmm, and I just noticed that RedHat has their RH Content Accelerator, which is based on the Tux code base, but has all sorts of zero-copy and performance enhancements made to it. So that seems to be the one to aim for performance-wise.

Now, I know that FreeBSD can hit some pretty high speeds, many http cache and load balancer companies use it as their OS of choice for their products. They make a lot of changes (and I bet a lot of their stuff runs in the kernel instead of userland), so it's not vanilla FreeBSD, but I know it will scale up pretty well.

I've been thinking a lot about FreeBSD and Darwin lately. I love OS X and Mac hardware. I think the proper server niche for both is to use FreeBSD for load balancers and http caches, and to use OS X for dynamic content serving (like audio, images or video). FreeBSD doesn't have rich multimedia apis, but it makes up for that with super-fast network performance and lots of heavy-lifting apis for shuffling bytes around. OS X isn't as fast as FreeBSD (and still runs at a sizeable fraction of FreeBSD's performance), but you can employ technologies like Altivec, Core Audio, Core Image, and Core Video.

The two OS's combined make a really nice platform target, since OS X is already using large parts of FreeBSD for its userland base, and has ported a lot of FreeBSD kernel features too. I'm waiting for netgraph to make its appearance in OS X, but maybe that will never happen. Still, you have to look at your workload. If you're scaling images all day long, then even if your FreeBSD-tuned httpd is 2x as fast as the one built for OS X, the OS X one is probably going to spit out the scaled image with a lower overall wall clock time because it has access to much more dsp power in the form of Altivec and Core Image. So as long as you don't blindly stick to one or the other, you can have your cake, and eat it too.

Now I need to go and read up on FreeBSD in-kernel httpds, there have been a couple of those posted over the years to the lists...

BTW, I picked up the latest edition of Stevens's Unix Network Programming book today. I loaned out my second edition to someone at Amazon, but silly me, forgot to write down who it was. I've been flying without a copy of UNP when doing my recent server work, which makes me feel a little naked. It's nice to have it back, and an upgraded edition to boot!

Posted by djb at 03:21 PM | Comments (0)

April 07, 2005

First major set of embedded RubyCocoa changes complete

I figured out how to set all the config switches for my changes inside the existing install infrastructure instead of grafting on some clumsy OOB scripts. You enable the embedded behavior by doing 'ruby install.rb config --embed' instead of 'ruby install.rb config', and everything else is automatic.

I sent the current set of changes to seattle.rb to see if anyone wanted to review it, and I'll be contacting Kimura-san to see what he thinks about the changes. He expressed interest in bundling these changes into RubyCocoa proper, so if that happens, I'll probably update my slides and flesh them out some more, I've learned more about RubyCocoa during the course of this mini project.

I couldn't help myself though, for the past week, I've also been working on a PLB-based httpd core that will eventually be tuned for OS X. And I might finally have an idea for a Dashboard app, I've been wanting to write a Dashboard app for a while, but I didn't have any interesting ideas until now. I'll talk about it more later if something fruitful comes from it.

I've also restarted classes and have a heavier schedule than last quarter, so I've spent a lot of this week studying. Hooray for calculus.

Posted by djb at 08:11 PM | Comments (0)