Some code I've written


Chunkfs is a set of techniques for compartmentalizing file system metadata such that small pieces can be checked and repaired. A prototype is available as a concrete example of what this would look like.

e2fsck IO parallelization

I implemented parallel readahead of metadata in e2fsck using a pretty hysterical combination of fadvise(), read(), and blindfolded buffer cache manipulation. It gets about 50% improvement in elapsed time on a RAID5. Current e2fsck readahead patch.

Relative atime

My friend Akkana followed my advice to use the noatime mount option to cut down on writes to her file system, but then mutt couldn't tell which mailboxes had been read since they were last modified. So I wrote the relative lazy atime patch, which only updates the atime if the old atime was older than the ctime or mtime. It turns out this is terribly useful for distributed/clustered file systems as well, since metadata writes are generally very expensive and the alternative is either no atime, or some kind of annoying tunable atime quantum. Currently in -mm, rewritten by Andrew Morton to be slightly less turgidly dense. Now with accompanying patch for the mount utility (although you can still just use the test program to test it if you don't want to download and recompile util-linux).

Relative atime kernel patch against 2.6.18-rc4

Relative atime userspace patch against util-linux 2.13-pre7

Test script to test relatime

Bare bones mount program to remount with either relatime or lazyatime (depending on kernel patch)

Older lazy atime patch, which updates the atime in memory but doesn't bother writing it to disk. The major problem with this is that atime can go backwards if an inode is evicted from memory.

Lazy atime patch against 2.6.18-rc3

malloc() tuning

The default malloc() threshold for allocating using mmap() versus brk() is designed for 32-bit systems with 2001 memory capacity, and winds up doing too many allocations using mmap(). This is especially harmful when the application is doing an alloc/write/free cycle, as the memory has to be cleared unnecessarily in between each free and alloc. Arjan van de Ven and I wrote a patch to glibc that dynamically adjusts the threshold based on the size of free()'s.

Update: Merged as of August 24, 2006 (with some additional tweaks).

Dynamic mmap threshold patch

ebizzy workload generator

I needed a workload that represented a class of applications fairly common in web application servers (think search and indexing services). Basically, a lot of threads are all doing the same kind of thing to a big (many GB's) chunk of memory: pick some chunk of memory, make a copy, look up some record in it using binary search, free that memory again. Microbenchmarks don't cut it, because the performance issues only kick in when everything is combined. ebizzy tries to replicate this for system tuning purposes. Licensed under the GPL.

The source is now hosted on Sourceforge.

OLD 0.1 release

Ext2 fsck reduction project

These patches are intended to reduce the average time spent on fscking ext2 file systems. For more info, see the paper (draft).

Latest patch for fswide dirty bit

Port of ext3 reservations to ext2 Martin Bligh and Andrew Morton cleaned this up and it's floating around in one of Andrew's trees somewhere; ask for it.