« February 2005 | Main | April 2005 »
March 29, 2005
RubyCocoa bundling
There was a new release of RubyCocoa a few days ago, it now builds fine on newer builds of OS X. They also turned on the relative Frameworks path inside RubyCocoa.framework that lets developers bundle RubyCocoa into their apps. If you don't mind depending on the system ruby 1.6.7, then you can use the dmg that the RubyCocoa developers provide, and ship RubyCocoa apps with an embedded RubyCocoa framework (which is only about 1MB total).
In the long run, developers are going to want to control what version of ruby they depend on, and make sure that all the third-party libraries they require are bundled with the app. I've mentioned before that I'm working on some changes to RubyCocoa to allow a full version of ruby to be bundled inside it... Well, I got in touch with W. Kimura, who made most of the changes for the recent release, and he's interested in incorporating my changes.
The work is a little difficult, since you want to embed an entire ruby distribution inside RubyCocoa, and make sure that the system ruby doesn't interfere in any way. It also needs to be fairly seamless, developers need to be able to build and integrate third-party ruby modules into their app (even ones that use C extensions) and make sure that are the ABI and path angles are covered.
So, the list of features I'm aiming for is:
- Add support for a Resources/ruby-core path inside RubyCocoa.framework that holds the binary bits of a given ruby install.
- Build script to import an existing ruby install and bundle it into Resources/ruby-core, making sure that rbconfig.rb/Config and other path configs are adjusted properly.
- Take ruby tarball and bundle it into Resources/ruby-core to get a fresh build.
- Make sure RubyCocoa's RBRuntime class knows where to find the embedded framework's standard classes, and any third-party modules that have been added. (This isn't too hard since RBRuntime is bundled into RubyCocoa's lib, and you can look at the bundle's path at runtime to figure out where the libs are hiding.)
I've got a basic hacked-up version of this working, but some things are hardcoded and I don't support the importing of an existing ruby install yet. Hopefully I can get that finished this week, since classes start up again on the 4th.
Posted by djb at 11:34 AM | Comments (0)
March 27, 2005
Rails vs Java
There was a pretty long thread on theserverside.com about Rails and Java.
Reading the comments in the thread, it's obvious that some of the Rails and Java advocates have gotten pretty wrapped up in the One True Framework (TM). I think the Rails folks could be a little less incendiary in their claims, and that the Java folks could be a little more open to suggestions and be willing to learn from people who don't necessarily agree with the Java approach to web frameworks.
I'm really enjoying Rod Johnson's lightweight J2EE book, there's lots of great advice in it. After reading the above thread, this quote jumped out at me:
"All good developers are intellectually curious and excited about the technologies they use. Exceptional developers know to subordinate these drives to the real needs of the projects they work on."
I think advocacy is a good thing, but no matter what project you're looking at, there are always a few young turks who take things a little to far, poking out one too many eyes along the way. I've seen enough idioms come and go that I get a little detached when looking at new technologies. It's just like reading the news, you have to look at it with a critical eye and filter out bias to figure out what the real costs and benefits are.
Developers are a funny breed; we can get all worked up about tools and methodlogies and having huge flame wars over mechanisms, when in the end, it's good design, documentation, and project management that determines whether or not a project is going to be successful. I've said embarrassing things in the past when advocating or defending the framework or methodology of the day, it's kind of fun to go over my sent mail folder and chuckle at the crazy things I used to say.
Now, don't get me wrong, I still say crazy things, but I tend to qualify my ramblings now. Or stick them on my blog. :-)
Posted by djb at 12:10 PM | Comments (0)
March 23, 2005
REST update
I had lunch today with my friend mentioned in my previous REST/SOAP post. He's been developing REST services using Tomcat, and is pretty pleased so far. He's got five service resources implemented so far, with only a few days of effort expended. We talked a bit about the scaffolding he's been building and we went back and forth discussing implementations and general J2EE stuff. One of the topics we spent a lot of time discussing was general J2EE development speed when combined with a good IDE like Eclipse. We used to work together on C++ services, and the develop/build/link/debug cycle took a bunch of time.
Just building a service took 5-10 minutes, and there were often runtime linking issues we'd have to debug and figure out. It didn't help that the C++ service frameworks we were consuming were changing pretty often, sometimes in non-backwards compatible ways. We worked together on a project that was using uncommon features of the frameworks, and banged our collective heads against the wall for quite a while, since there were all sorts of quantum bugs that would come and go as library versions changed, or even things as simple as the user environment. It was common to develop and debug only 50 lines of code a day, which drove us up the friggin' wall.
With J2EE and Eclipse, he's been writing as much code in an afternoon as we used to write in a week. A big part of that is using Java instead of C++, since there's no real link phase (and much shorter build times) and memory management is easier (as long as you are smart with variable scope and problem objects like strings).
There are two other productivity boosts, namely Eclipse and the rich J2EE apis. Having pop-up menus that show a list of potential methods for an object (along with their javadoc) makes things move along so much more quickly. We used to use vi/emacs for our editing, and while they can use ctags, Eclipse's code and method navigation is much higher level and more mature. It's not limited to Eclipse, any modern IDE would give good results, but Eclipse is the one that feels most natural to us.
The J2EE apis are pretty complex, and often maligned for that complexity, but I don't think their learning curve is as steep as some people make it out to be. It would suck trying to do J2EE with an 70s style editor like vi or emacs, but modern IDEs like Eclipse make it pretty straightforward. Eclipse even has wizards that help you generate boilerplate for different service patterns and best practices (as do the other Java IDEs), so if you get really lost, you can get some help from the IDE. You also get all sorts of patterns in the J2EE apis themselves, so things like caching, service directories, snmp, etc are available for use or can be adpated to your needs readily. There's no real equivalent on the C++ side of things.
I've been reading Rod Johnson's recent book, J2EE Development without EJB, and I like it a lot. Most of my J2EE work has been with servlets and lightweight services, and I wanted to broaden my J2EE toolkit while saving the EJB work for a later date, so the book has been fun.
On a lunch note, we went to Jack's Fish Spot in Pike Place Market and got the fish and chips special; he got the cod, I got the halibut. Jack's is a great place for lunch, since it is part of one of the big fish stands in the market, and the fish is as fresh as you can get. They fry the fish perfectly, it was the best fish and chips I'd had in quite a while.
Posted by djb at 07:14 PM | Comments (0)
March 22, 2005
Ruby GC and copy-on-write
I've spent most of my career writing services, so I'm used to taking advantage of the copy-on-write VM pages supported by modern Unix operating systems. If you're not familiar with COW, it's an optimization that lets the children forked from a parent process to reuse as many pages of memory from the parent as possible. With COW, the pages of the child process are locked, and as long as they don't modify memory inherited from the parent, they can use the parent's pages, resulting in huge memory savings when you're forking lots of children. When the child tries to change a page inherited from the parent, a fault occurs, and the OS makes a copy of the parent's page and places it in the child, unlocking the page so modifications can occur.
I'm used to loading libraries in the parent process and forking kids off of it, this technique works well for most languages on Unix. I recently found out that Ruby doesn't take advantage of COW, though.
Ruby uses a mark-and-sweep garbage collector, which means that when a GC run occurs, Ruby walks all the code and data nodes used inside its address space (the chunk of memory being used by that Ruby process) and makes a note of all the nodes that are still in active use. Once all the nodes have been walked, any nodes not marked can be freed and given back to the OS. It's elegant and simple.
Most of the mark-and-sweep implementations I've been able to find online keep a separate list for annotating which nodes are in use. Ruby's GC actually holds the mark/sweep flag inside the nodes themselves. This means that on the first GC run inside a child process, all of the nodes will be touched, causing COW to happen for almost all the pages inherited from the parent (except for the Ruby core interpreter itself, which is written in C and is typically stored in shared memory to be used by all running Ruby processes on the machine).
This means that if your parent is using 21MB of memory, each child will use 20MB of memory (the core interpreter takes around 1MB of ram), instead of the 1-2MB you'd expect given normal COW semantics.
Marking the nodes themselves is certainly easier than keeping a separate list, so I assume that this was done in the interest of implementation speed. I notice that the YARV project (Yet Another Ruby VM) uses Ruby's existing GC code. At first I figured that changing the existing GC wouldn't help too much since YARV is coming down the road soon, but after learning that, my interest is piqued.
Two of the seattle.rb members (Ryan Davis and Eric Hodel) have been working hard on two projects, RubyToC and MetaRuby. Their goal is to get a translator engine that generates C code from a subset of Ruby (RubyToC), and to implement the Ruby core itself in Ruby instead of C (MetaRuby), allowing more people to work on extending Ruby. There's precedent for this kind of project, the Squeak smalltalk group wrote their smalltalk VM in smalltalk, and while it was slow at first, it ended up being quite fast. Just MetaRuby by itself would probably be slower than the traditional core, but when you combine RubyToC, it could end up being faster. I don't know if their speed will match YARV, which is pretty promising, but it's good to have two different camps exploring the future of Ruby's core engine.
Since YARV is using the existing GC, I'm going to look into how difficult it would be to move the dirty/clean flags from the nodes to a separate list for the mark-and-sweep runs, that would reduce VM thrash and make Ruby more memory friendly.
A note re: forking... Ruby does support green threads, and several frameworks use threads inside a single Ruby process, so in those cases, this work wouldn't matter, but there are still situations where you want to fork multiple children for the listeners (the biggest one being Ruby-on-Rails running under FCGI).
p.s. My last final is tomorrow morning, so I'll be posting more often again. Yay!
Posted by djb at 05:25 PM | Comments (0)
March 14, 2005
Finals
I've got finals this week and next, so I'm going to be pretty busy studying and won't be doing a lot of software work.
On a math-related note, I'm thinking of buying a copy of Mathematica to help me visualize concepts from the math courses I'm taking. I like the instant gratification of being able to change a few variables and watch the graph of an equation change, it helps me connect behavior with theory and match equations to their effects more accurately. I took a trig refresher course this quarter, and have had to memorize dozens of equations, many of which look quite similar. Having some good graphing and equation solving software will help me a lot with my math comprehension.
Posted by djb at 09:41 AM | Comments (1)
March 10, 2005
Apple Performance Seminar
I participated in an online Apple seminar this morning on the topic of maximizing the performance of OS X applications. I took lots of notes, it was very informative. I had a good handle on the concepts of performance measurement and analysis, but it was great to watch one of Apple's performance architects discuss Apple's performance tools and what situations they are useful for. There was also a lot of good info on optimizing performance for the G5. Two points that really stuck with me were reducing automatic type conversions, and walking data structures in contiguous fashion (i.e. if you're walking a 2d array, make sure to access the elements in contiguous memory order so you get better locality). Because the G5's pipeline is so speculative, you can take heavier speed hits for silly code versus the same code running on a G4.
Another interesting tidbit was that Saturn is a traditional gprof-like profiler. I had looked at it before and was trying to figure out the difference between it and Shark, and there you go. Shark does samples of the app, so it's not as intrusive.
There was also an overview of Accelerate.framework's features. I've got some experience with the vDSP piece of Accelerate, since I'm using it to do FFT work with audio, but it was interesting to hear about the other pieces. I wonder how useful vImage will be with Tiger now that we've got CoreImage.
The session ended with about 20 minutes of Q&A, so it was interesting to hear what sort of questions my fellow developers had. I asked a question about measuring Altivec utilization, but they ran out of time and didn't answer it. This has been something I've been thinking about for a while, I'd love to get MenuMeters to break out cpu usage statistics and show me utilization for the "normal" cpu and show usage statistics for the Altivec separately. I'm always wondering how much headroom my Altivec units have, but there doesn't seem to be an easy way to find that out. I learned about Amber, an instruction tracer that OS X provides, but that seems to only give me post facto statistics.
Overall, it was a good presentation, and only took 90 minutes from start to finish.
Posted by djb at 02:26 PM | Comments (0)
March 08, 2005
Projects update
So, I did some thinking last week and decided on projects I'm going to be working on over the next quarter.
I'm going to be spending time working on:
- Adding features to RubyCocoa in order to make prototyping easier, plus generation of ObjC skeleton code from RC prototype projects.
- Determine best HTTP server model for integrating ImageUnits for fast image transforms (choosing between single proc with libevent multiplexing between active sockets, a libevent multiplexer that talks over FCGI-like protocol to rendering engines in a pool of separate processes, and a traditional apache install that hooks in ImageUnits as an apache module).
- Framework to enable quick generation of political issue ads (thank you CoreVideo!).
I picked three projects, one involving open source, one involving commercial development, and one involving political/nonprofit work. I've been keeping in touch lately with a lot of ex-Amazon folks who have been working in the political/nonprofit sector, and there's a lot of good stuff going on right now. Hopefully, I'll have more news on the political front in coming weeks, if some of these projects pan out.
Posted by djb at 09:56 AM | Comments (0)
Apple article on new CoreImage tools
Apple has posted an article on their developer site discussing the new CoreImage features of Tiger, including a preview of the two CI tools I've been playing with a lot but couldn't discuss before.
First of all, read the article.
Quartz Composer is the one I've been using the most, you can see a screenshot of it at the bottom of the article. You hook up component pipelines with a GUI and then save them as compositions, which can be loaded by applications or turned into screen savers. This makes the creation of motion graphics pretty simple, and it's a good way to prototype advanced image/video features in an app without having to write all the code up front. I'm using it to create video transitions to be used for a video markup app. I had wanted to do this with Panther, but there was too many manual things to keep track of, it wasn't possible to get something working quickly before Tiger came along. There's going to be a huge bloom of video/graphics apps once Tiger is available to the entire Mac community.
Like the article says, CoreVideo is a bridge between Quicktime and Quartz. There's a nice CoreVideo pipeline where you can apply sub-pipelines of ImageUnits to your video, and then the frames are rendered onto the screen using OpenGL. It's much faster than the QT6 method. Drawing text on top of a video frame in realtime with QT6 made my powerbook yelp in pain, it didn't have a lot of cpu time left over after that. With CoreVideo, I've got plenty of juice left. Apple is still being vague about CoreVideo and QT7 AFAICT, so I won't say much more about them, but there's some other really great features that make adding video to Cocoa apps much easier than before.
Like I said last week, as new OS releases come out, Apple keeps peeling back layers of the OS and giving developers high-level APIs that give us access to lower-level features. They've done an excellent job with the design of their classes, you can tell how much thought went into it. Whenever I speculate about what sort of features OS X.5 and X.6 will have, I get a happiness stroke. It is very difficult to not sound like a fanboy when working with their software, it is the best of all worlds for me. My background is enterprise services and storage systems, so Unix is a dear old friend of mine. I love having a Unix-based OS with a commercial force behind it, there's no way Linux can catch up. The Linux fanboys trash Apple for having expensive hardware, but I gladly pay the premium prices in order to get a true multimedia Unix OS plus functional and beautiful hardware. I understand the principle behind their stance, but will let them use their antiquated APIs and commodity hardware while I develop video/audio apps and live out here on the bleeding edge of technology. :-)
Posted by djb at 09:26 AM | Comments (0)
March 05, 2005
REST vs SOAP
I've been having an interesting discussion with a friend of mine, he is researching to figure out what service protocol to use for his company's service layer. We've been talking about all sorts of things, including JGroups, Spread, CORBA, REST, ARREST, SOAP, and a few commercial middleware layers. Messaging middleware is a fun area, there are so many pros and cons to consider.
Amazon uses a lot of multicast messaging (search google for '+amazon +tibco +multicast'), but multicast is hard to get right. You have to closely monitor the network and multicast storms are a real possibility. I view multicast messaging the same way I view writing in assembler; it's high-bandwidth, but fragile and suitable mostly for limited problem domains. I think multicast makes good sense for low velocity OOB messages (like machine configuration, metrics broadcasting, host discovery, etc), but using it as a messaging backbone seems like overkill for most folks. I've heard anecdotally that one of the largest multicast installs in the world runs the network that passes trade orders on wall street between the trade floor and the brokerage firms. That would be an interesting domain to work in, to say the least.
My friend has basically narrowed down his choices to SOAP or REST. First of all, I think that using HTTP as a middleware transport is a smart choice. You get drop-in load balancing and caching of service calls with several choices of implementation (free and commercial). Scaling becomes a pretty simple exercise, and you can use tools for HTTP introspection to debug your transport layer if you run into problems. Prototyping is easy too, most languages have SOAP/REST bindings.
SOAP is nice because WSDL lets you advertise supported service calls, but there's a definite overhead when using SOAP envelopes (both on the send and receive side), and while it uses HTTP as its transport, it is harder to debug than REST. With REST, you just paste the url into your web browser and look at the results. Because the REST calls are simple urls with args encoded in the uri string, you can build all sorts of abstractions on top of your service. The most popular calls can be exposed on your developer intranet as a RSS feed and marked as candidates for optimization. You can add a debugging field to the calls (append /debug to them, perhaps), and make your service print out detailed query tracing for SQL calls. Append /pretty to indent the returned xml and display it as beautified html. It's much easier to cache REST than it is to cache SOAP, many cache implementations out there don't even cache POST calls.
Another optimization that REST enables is easier routing of service requests. Typically, developers deploy web services and place the boxes behind a load balancer or a cache, which means that the requests get spread evenly across the servers and everyone's cache is fairly lukewarm. Modern load balancers and caches let you horizontally partition requests and route them to different servers based on the contents of the uri, which lets you partition your servers using your primary key namespace from your request. Each server host works on a slice of the total working set and its caches (buffer, memory, and any software-based caching) all get scorching hit rates. Routing based on the contents of a SOAP envelope from a POST to the server requires more overhead.
The big downside of REST is that advertisement of service APIs is not built into the system. This is a really nice feature of SOAP/WSDL, but if you're implementing an internal service layer, then the benefits of WSDL (broadcast of APIs and programmatic versioning) aren't a hard requirement. You can afford to place a thin wrapper abstraction on top of your service layer that serves as a contract between internal service clients and the services themselves. And while the arguments are encoded in the url, there's nothing saying you can't use a DTD or schema for the results coming back and verify the document is well-formed.
I'm a big believer in using HTTP as a generic transport, it works for most type of request/response paradigms, and if you want to get fancy, the ARREST framework gives you true async messaging with nice performance. There is a per-request overhead you get with HTTP versus rolling your own server layer and doing some sort of binary encoding (XDR, etc), but I think that extra per-request cost is more than made up for by the easy debugging, scaling and development of HTTP-based services. To me, in most cases, the choice is really REST for internal services, and SOAP/WSDL for customer-facing ones. REST is more quick to develop with, although you do need a little bit of OOB structure in order to manage the supported calls and versioning of your service layer. I think in most cases though, the benefits of REST over SOAP/WSDL outweigh the drawbacks.
One last reason I dig HTTP for service layers is that you get trivial wrapping with ssl if you want to make the transport layer opaque. And ssl accelerators negate the need to waste server host cycles on crypto routines.
Posted by djb at 09:50 AM | Comments (1)