« MacBook Pro 13" Review | Main | Upcoming Ubuntu 9.04 release »
Friday
Apr032009

Extents are better

More and more, modern filesystems are moving away from block-based filesystems and moving toward extent-based systems. What the heck does that mean? Well, on a very high level, this means data is stored on disk with less overhead, leading to better performance and more efficient use of disk space. Technically speaking, though, let's break it down...

Block-based storage

Block-based filesystem layouts are traditional, well tested, and old. Really old. The theory behind its operation is very simple- A chunk of data is used to describe a larger chunk of file data is stored. Information such as location, permissions, creation / modification / access time, and where on the disk the actual file data is store are stored in the filesystem block, telling your computer where and how to access file data. The files actual contents are stored elsewhere on disk, typically in chunks of 4KB. Each of these 4KB chunks of file data require a filesystem block, so for a 100MB file you need 25,600 filesystem blocks! Each of these filesystem blocks needs to be read to tell the computer how to read from one end of the file to another. The more your hard disk needs to search around for the location of filesystem and data blocks, the longer this whole process takes. Usually, this all happens very quickly, but there certainly are cases where it can take a very long time.

These filesystem and file data blocks also lead to a phenomenon known as file fragmentation. Simply put, fragmentation is caused by files being modified after they were initially created, or files being created on heavily fragmented disks. Fragmentation itself is simply the separation of file data blocks with regard to each other on disk. To best envision this, imagine going on a scavenger hunt across your town, collecting pages of a book before you could read it. On that scale, it could take you months to re-assemble something like Moby Dick! Don't worry, though. There's a better way!

 

Extend-based storage

An filesystem extent is much the same as a filesystem block, except that it describes a collection of data bytes instead of strictly sized blocks. In other words, an extent describes a section of a file. All of the same filesystem data is contained in an extent- disk location, file name, etc. But the largest difference is that it also contains the size of the segment of a file's data it describes. So, theoretically, if there is a section of your disk drive that contains 100MB of free space, a single extent could be used to describe a 100MB file! This is so much more efficient than block-based storage that there are very few filesystems not using extents today!

 

Visualizing it

So, some of you may not be able to envision all of this in your head. I'll be the first to admit the fact that it's weird that I can. For you, I've made graphical representations. In the image below, imagine the green blocks are filesystem blocks- the data that describes your file's contents. The red blocks are the actual data of your file.

Block-based file layout

Here we can see that there are several chunks of data used to describe your file's content, which has also been split up into multiple chunks. Remember that the disk needs to read each green block to know where and how to find a red block.

 

Now we look at extent-based storage. Again, the green square represents the filesystem data that describes your file's content, which is found in the red squares.

Extent-based file layout

Instantly, you see the stark contrast. There is less data wasted describing your files contents, which are laid out in a more contiguous manner. Since disks read contiguous data faster than data that is scattered around a disk, your benefit is two fold. You have to read less descriptor blocks, and you have to search around the disk fewer times for actual file data.

 

Wrapping it up

So, what does it all mean to you? Well, it means you can store larger and larger files on your disk drives with less and less overhead. It also means that data can be retrieved in a much quicker manner, and finally it means someone out there cares about how you spend the milliseconds in your life. After all, it's nice to know someone cares, right?

PrintView Printer Friendly Version

EmailEmail Article to Friend