You are viewing emmes

Fri, Jun. 3rd, 2011, 12:31 pm
RAnsrID: Galois Field / Reed-Solomon code rewrite pending

While starting to implement error resilience (not erasure resilience, which is already working nicely) in my RAID-lookalike multi-disk block device RAnsrID, I finally had to notice that I didn't understand one aspect of Galois Field mathematics - the Reed-Solomon representation type has heavy influence on how to deal with errors (read: corrupted data).

So far, I only understood the canonical matrix style representation (basically a linear combination over the data disks for each redundancy disk). Turns out that with polynomial representation you can create way better (read: faster) algorithms for error correction - according to my analysis, erasure and error recovery is O(N²), compared to O(N³) for erasure correction in the linear combination case, and unknown (presumably O(N³⋅M²)) in the error case (N: total number of disks, M: number of redundancy disks).

Thus I will rewrite the redundancy routines based on Phil Karn's Reed-Solomon implementation - the best implementation with an open license I could find. Most implementations (like in par2) don't bother with error correction, and use block-level checksums to detect errors. That done, erasure recovery can be used. Needless to say, this is no option for my block device (where and why should I store the checksums).

No need to say that this delays delivery timeframe of RAnsrID further; especially as I'd like to incorporate the change in an at least data preserving backwards compatible way.

Also Hackweek 6 wasn't as productive as I hoped; I only managed to get the test suite up and running. Oh well.