« First major set of embedded RubyCocoa changes complete | Main | Java games and field trips »

April 15, 2005

kqueue and libevent

I've been spending a lot of time this past week looking at kqueue and libevent examples out there, specifically benchmarking my G5 to see what kind of http performance I can expect from it.

A simple non-keepalive select based daemon is serving 9k requests per second (RPS) per cpu. I've been hacking on PLB (pure load balancer, a free software load balancer) and have made some changes so I can keep the core libevent code for listening on sockets and watching for read/write conditions, but I tore out the proxy code so I could make it behave like a plain httpd. I'm still tuning and profiling it, but I'm getting obscene request rates with that codebase. I think that once I'm finished, I'll be able to do 30k-50k RPS per cpu. I've found some excellent papers on analyzing system call bottlenecks, so it's possible I could go even higher than my projections, but I think that 30k is a safe bet.

PLB uses libevent, which is a cross-platform library that finds the best event notification library on your OS (be it epoll, kqueue, or /dev/poll), and wraps it into a nice api. If you use libevent, you can create server code that supports tens of thousands of concurrent connections, without taking a huge hit. This logarithmic graph shows how well it scales.

BSD-based OS's use kqueue and kevent, which originally came from FreeBSD. I like the kqueue framework because it allows you to setup observers on all sorts of system state changes. For example, I can open a fd to a critical config file on my system, and if the file is copied, unlinked, renamed, written to, extended, or hard linked to, I am notified as soon as these changes occur and can take appropriate action. It doesn't block the actions from happening, but it gives me a cheap and guaranteed way to know they did. If I was writing a deployment nanny, I could register all the config and binary files for my deployment (which won't be more than my per-process fd limit) and alarm if any of the files are changed while the process is running. How do I know it's still running? Because there's a kqueue watcher for processes as well, and it follows them across fork or exec calls. :-)

Basically, kqueue lets you take a random fd (a network socket, a file on disk, a pipe, or a fifo) and register observers so that you are notified when it's ready for reading or writing. It also has observers for vnodes (the vm objects that model files inside filesystems. they're at a higher level than inodes), and signals. Darwin doesn't have support for two FreeBSD kqueue features, which are async I/O notifications (it doesn't have async I/O at all), and generic timer notifications.

Anyways, kqueue has beefier features than linux's epoll, and lets you do some pretty freaky stuff. All the BSD cousins have sysctl, so you can bump your per-process fd limits pretty high, darwin goes all the way up to 64K.

I've also been reading up on Tux, which is Linux's replacement for the first generation khttpd, a http daemon that ran entirely in the kernel. Tux is pretty darn fast, and will end up being faster than userland httpds, even if they use low-cost event frameworks like epoll or kqueue, simply because it doesn't have to context switch. Tux is an upper bound on performance with Linux httpds, so it's a good goal to shoot for.

I think a kqueue-enabled BSD httpd could get pretty close to Tux's speeds, and I think a kernel-level FreeBSD httpd could even beat Tux's speeds. I'm still learning about the newer FreeBSD features like netgraph, zero-copy sockets and zero-copy apis, but it sounds like I could get some pretty high rates if I employed those with this code, but running on a FreeBSD box.

Hmmm, and I just noticed that RedHat has their RH Content Accelerator, which is based on the Tux code base, but has all sorts of zero-copy and performance enhancements made to it. So that seems to be the one to aim for performance-wise.

Now, I know that FreeBSD can hit some pretty high speeds, many http cache and load balancer companies use it as their OS of choice for their products. They make a lot of changes (and I bet a lot of their stuff runs in the kernel instead of userland), so it's not vanilla FreeBSD, but I know it will scale up pretty well.

I've been thinking a lot about FreeBSD and Darwin lately. I love OS X and Mac hardware. I think the proper server niche for both is to use FreeBSD for load balancers and http caches, and to use OS X for dynamic content serving (like audio, images or video). FreeBSD doesn't have rich multimedia apis, but it makes up for that with super-fast network performance and lots of heavy-lifting apis for shuffling bytes around. OS X isn't as fast as FreeBSD (and still runs at a sizeable fraction of FreeBSD's performance), but you can employ technologies like Altivec, Core Audio, Core Image, and Core Video.

The two OS's combined make a really nice platform target, since OS X is already using large parts of FreeBSD for its userland base, and has ported a lot of FreeBSD kernel features too. I'm waiting for netgraph to make its appearance in OS X, but maybe that will never happen. Still, you have to look at your workload. If you're scaling images all day long, then even if your FreeBSD-tuned httpd is 2x as fast as the one built for OS X, the OS X one is probably going to spit out the scaled image with a lower overall wall clock time because it has access to much more dsp power in the form of Altivec and Core Image. So as long as you don't blindly stick to one or the other, you can have your cake, and eat it too.

Now I need to go and read up on FreeBSD in-kernel httpds, there have been a couple of those posted over the years to the lists...

BTW, I picked up the latest edition of Stevens's Unix Network Programming book today. I loaned out my second edition to someone at Amazon, but silly me, forgot to write down who it was. I've been flying without a copy of UNP when doing my recent server work, which makes me feel a little naked. It's nice to have it back, and an upgraded edition to boot!

Posted by djb at April 15, 2005 03:21 PM

Comments

Post a comment




Remember Me?

(you may use HTML tags for style)