« Finals | Main | REST update »
March 22, 2005
Ruby GC and copy-on-write
I've spent most of my career writing services, so I'm used to taking advantage of the copy-on-write VM pages supported by modern Unix operating systems. If you're not familiar with COW, it's an optimization that lets the children forked from a parent process to reuse as many pages of memory from the parent as possible. With COW, the pages of the child process are locked, and as long as they don't modify memory inherited from the parent, they can use the parent's pages, resulting in huge memory savings when you're forking lots of children. When the child tries to change a page inherited from the parent, a fault occurs, and the OS makes a copy of the parent's page and places it in the child, unlocking the page so modifications can occur.
I'm used to loading libraries in the parent process and forking kids off of it, this technique works well for most languages on Unix. I recently found out that Ruby doesn't take advantage of COW, though.
Ruby uses a mark-and-sweep garbage collector, which means that when a GC run occurs, Ruby walks all the code and data nodes used inside its address space (the chunk of memory being used by that Ruby process) and makes a note of all the nodes that are still in active use. Once all the nodes have been walked, any nodes not marked can be freed and given back to the OS. It's elegant and simple.
Most of the mark-and-sweep implementations I've been able to find online keep a separate list for annotating which nodes are in use. Ruby's GC actually holds the mark/sweep flag inside the nodes themselves. This means that on the first GC run inside a child process, all of the nodes will be touched, causing COW to happen for almost all the pages inherited from the parent (except for the Ruby core interpreter itself, which is written in C and is typically stored in shared memory to be used by all running Ruby processes on the machine).
This means that if your parent is using 21MB of memory, each child will use 20MB of memory (the core interpreter takes around 1MB of ram), instead of the 1-2MB you'd expect given normal COW semantics.
Marking the nodes themselves is certainly easier than keeping a separate list, so I assume that this was done in the interest of implementation speed. I notice that the YARV project (Yet Another Ruby VM) uses Ruby's existing GC code. At first I figured that changing the existing GC wouldn't help too much since YARV is coming down the road soon, but after learning that, my interest is piqued.
Two of the seattle.rb members (Ryan Davis and Eric Hodel) have been working hard on two projects, RubyToC and MetaRuby. Their goal is to get a translator engine that generates C code from a subset of Ruby (RubyToC), and to implement the Ruby core itself in Ruby instead of C (MetaRuby), allowing more people to work on extending Ruby. There's precedent for this kind of project, the Squeak smalltalk group wrote their smalltalk VM in smalltalk, and while it was slow at first, it ended up being quite fast. Just MetaRuby by itself would probably be slower than the traditional core, but when you combine RubyToC, it could end up being faster. I don't know if their speed will match YARV, which is pretty promising, but it's good to have two different camps exploring the future of Ruby's core engine.
Since YARV is using the existing GC, I'm going to look into how difficult it would be to move the dirty/clean flags from the nodes to a separate list for the mark-and-sweep runs, that would reduce VM thrash and make Ruby more memory friendly.
A note re: forking... Ruby does support green threads, and several frameworks use threads inside a single Ruby process, so in those cases, this work wouldn't matter, but there are still situations where you want to fork multiple children for the listeners (the biggest one being Ruby-on-Rails running under FCGI).
p.s. My last final is tomorrow morning, so I'll be posting more often again. Yay!
Posted by djb at March 22, 2005 05:25 PM