« Co-dependent co-processors | Main | E3 and recent announcements »
May 13, 2005
Disks in the ether
I was browsing the new Linux Journal and saw the article on ATA-over-Ethernet, where raw disk blocks are transmitted over ethernet and used to build cheap SANs, instead of the traditional fibre channel route.
FC gives you raw access to drives, so this is taking the same approach but with ethernet. I don't quite know what I think about it yet. But it sounds promising.
Upon further research, I also found iSCSI, which implements SCSI over ethernet. Originally, iSCSI just ran on gigabit networks, but with 10GbE becoming common in datacenters, it's becoming very competitive with FC, which normally goes 2Gbps and can do up to 4Gbps. And there are new iSCSI enhancements that let you do error correction and route fixing, similar to how Xsan detects breaks in the FC fabric and can pass blocks through one of N controllers.
I need to read more about iSCSI, but it sounds very interesting. I had never heard of ethernet-based solutions for reading/writing raw disk blocks before now, but it has several upsides over traditional FC SANs, the main ones being higher speed and unlimited range. Well, unlimited range in the case where you're streaming versus doing random access. If you're reading a 50MB range of contiguous blocks from a remote disk, then the N ms latency isn't a big deal since you're going to take an intiial pause of N ms and then get all your data in one nice stream. But if you're doing random seeks on a disk that's 50ms away, then you're gonna cry.
I'm thinking the WAN disk solution probably isn't used as much as the LAN/SAN one, so latency probably isn't a big deal for most installs. But since you're making the disks available as a raw device, I wonder how disk fragmentation comes into play and whether or not that starts to limit your throughput. I'm guessing FC installs have similar issues, but since their range is so limited, other sources of error/slowness show up first.
Using ethernet instead of FC makes sense from the perspective that so much more brainpower is going into making ethernet-based switches, routers and computers use the network as efficiently as possible. There might be 1000 engineers worldwide working on FC enhancements, but you could imagine there's 100,000 working on ethernet hardware and protocols. FC does do things that TCP/IP can't, so it's not like FC is going to die in the storage market, I just think that for many sites, its cost won't make up for its benefits, and iSCSI or an alternative implementation will be used.
And that doesn't mean that userland SAN (like Google FS) is going away anytime soon either. It's certainly the cheapest option out there for people who want to build storage clusters, and putting abstractions in front of the raw disk blocks and serving up data via HTTP or similar protocols gives you capabilities you can't do with iSCSI.
For example, while some companies use SAN systems for storing databases or high update velocity filesets, a lot of folks use SANs to store filesets that are fairly static. Think Netflix storing raw VOB images of all the DVDs they rent, or Amazon's archive of product images. Once you store a copy of something, you rarely update it, if at all.
And if your objects aren't being updated very often, then that screams out for caching. Using HTTP or similar protocols to transport data lets you plug in caching and load balancing pretty easily, but raw blocks over the network with solutions like iSCSI aren't exactly easy to cache. I'd imagine they're pretty hard to load balance as well.
But, I admit that my brain is fairly addled with the userland SAN point-of-view, so I'm going to do some research on blocks-over-the-network technologies and think about them some more. The major storage players are deploying/selling iSCSI systems now, and they're not dummies, so I want to read up more on the technologies.
As an aside, it seems that the more interesting technologies I run into, the more my home datacenter (and its corresponding budget) expands. Some people buy boats, some buy cabins at ski resorts, but I spend my extra money on servers, disk, and switches. As it is now, I can't run my hairdryer in the bathroom, since my office is on the same circuit, and the hairdryer combined with my machines trips the circuit breaker. If I end up staying in my place, I'm going to get an electrician to come over and install two isolated circuits in my office so I can run all the servers and blinky-light boxes my heart desires.
Posted by djb at May 13, 2005 01:21 PM