[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Reed-Solomon optimised

From: Vladimir 'φ-coder/phcoder' Serbinenko
Subject: Reed-Solomon optimised
Date: Sun, 13 Nov 2011 23:41:53 +0100
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv: Gecko/20111010 Iceowl/1.0b2 Icedove/3.1.15

Hello, all. When I was implementing raidz, I thought of a more
convenient, compact and faster representation of GF(256). Using it I
could make raid6_recover more compact and Reed-Solomon more fast. Later
is very important since we have to run Reed-Solomon check (but not
necessarily much slower recovery) on every boot. I expected 8-fold
speedup but got 16-18-fold one. The idea was the following:
- In standard representation of GF(256) it's fast to add 2 numbers: it's
just a XOR.
- We also know that every non-zero number in GF(256) is represented as a
power of X: a=X^s
So if we know that a=X^s and b=X^t then
Since there are only 255 such elements for each one of them we can
precompute s s.t. X^s=a and we can also precompute powers of X. By
choosing s in interval 0...254 we ensure that 0<=s+t<=508, so we need to
store only 508 elements and then we can multiply any 2 elements by first
checking that both are non-zero (and output 0 if any of them is), one
addition and 3 table lookups (which are probably in cache given their
tiny size) which is much faster than the loop with shifts.
Same idea is usable for GF(2^n) up to n=16. Currently in the code we
have GF(256), GF(2^32), GF(2^64) and GF(2^128).
GF(2^32) and GF(2^64) are used for CRC-32/CRC-64 where they are implicit
with all 256 values precomputed.
GF(2^128) is used in cryptodisk and ZFS-crypto. In cryptodisk it's used
in LRW and XTS. In LRW thanks to the trick from LRW we need only 2
multiplications per sector. XTS does only multiplications with X which
are fast (although the version in GRUB could be optimised by shifting
uint64 and not uint8 at time but it needs care to avoid endianness
ZFS-crypto in GCM on the other hand probably has the problem of slow
GF(2^128) multiplication. It's possible to optimise it somehow using
some precomputation but it hasn't been done.
Also if you notice any other ways to speed up the boot or make critical
modules or kernel more compact, please share your ideas.

Vladimir 'φ-coder/phcoder' Serbinenko

Attachment: signature.asc
Description: OpenPGP digital signature

reply via email to

[Prev in Thread] Current Thread [Next in Thread]