Log in

Fri, Jun. 3rd, 2011, 12:31 pm
RAnsrID: Galois Field / Reed-Solomon code rewrite pending

While starting to implement error resilience (not erasure resilience, which is already working nicely) in my RAID-lookalike multi-disk block device RAnsrID, I finally had to notice that I didn't understand one aspect of Galois Field mathematics - the Reed-Solomon representation type has heavy influence on how to deal with errors (read: corrupted data).

So far, I only understood the canonical matrix style representation (basically a linear combination over the data disks for each redundancy disk). Turns out that with polynomial representation you can create way better (read: faster) algorithms for error correction - according to my analysis, erasure and error recovery is O(N²), compared to O(N³) for erasure correction in the linear combination case, and unknown (presumably O(N³⋅M²)) in the error case (N: total number of disks, M: number of redundancy disks).

Thus I will rewrite the redundancy routines based on Phil Karn's Reed-Solomon implementation - the best implementation with an open license I could find. Most implementations (like in par2) don't bother with error correction, and use block-level checksums to detect errors. That done, erasure recovery can be used. Needless to say, this is no option for my block device (where and why should I store the checksums).

No need to say that this delays delivery timeframe of RAnsrID further; especially as I'd like to incorporate the change in an at least data preserving backwards compatible way.

Also Hackweek 6 wasn't as productive as I hoped; I only managed to get the test suite up and running. Oh well.

Wed, Mar. 9th, 2011, 11:56 am
RAnsrID continued

Our group is now in HackWeek 6, quite a few weeks delayed after all other groups at SuSE. I will use the time to (finally!) continue work on RAnsrID - see also my initial blog entry. The project source is hosted on gitorious.

The basic redundancy routines are all working already, next is a usable test suite, then run-time configuration management (live adding and removing disks, live reconstruction w/o repair in the read error case).

I doubt I will reach a final version 1.0 I can recommend to use, but it will hopefully be close.

Tue, Jun. 8th, 2010, 06:46 pm
RAnsrID - git repository published, demo on LinuxTag 2010

I have just published my RAnsrID git repository on gitorious.org. Beginning now I will stay backward compatible with old versions of journal and disk meta structure blocks. Get the git repo from
    git clone git://gitorious.org/ransrid/ransrid.git

Unfortunately, there is little (read: no) documentation available yet; that will change after LinuxTag. Upto then the only doc is the heavily commented source code. Grab it, study it, enhance it, send a patch - that's the open source way.

For LinuxTag I have another goodie - I will be traveling with four USB disks and give a short live demo of what the system is already capable of. Live add and removal of disks isn't working yet, but reading, writing, validation, and rebuilding is.

Note that nbd used to freeze machines during writes if client and server were running on the same machine. Since kernel 2.6.26 there is a patch included that ought to fix this issue, but there were some (inconclusive?) discussions about this patch beforehand. Using xen for the client seems to work around this issue as well, though.

Thu, May. 27th, 2010, 01:53 am
RAnsrID - Redundant Array of Non-Striped Really Independent Disks

In my spare time I've been working on a RAID-lookalike system for storing large amounts of data with multiple redundancies - and with significantly lower power consumption and disk spinning time than standard RAID if you only access single large files in a typical session.

The whole thing is implemented as a network block device (nbd), and will be presented (in an early, but at least already partially working state) on LinuxTag 2010 in Berlin.

Note that this is not a direct competitor to a standard RAID solution - in fact, I propose using a RAID 1 for the journal it needs (e.g. use the system disk - you're already using a RAID there, right?). For a comparison table check the project page.

Source will be available soon, I've not decided which git hoster to use yet. I don't think it's reasonable to put this on freedesktop, because is relation to freedesktop to close to nothing. I might change my mind, though .