grub-devel
[Top][All Lists]

Reed-Solomon optimised

 From: Vladimir 'φ-coder/phcoder' Serbinenko Subject: Reed-Solomon optimised Date: Sun, 13 Nov 2011 23:41:53 +0100 User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.23) Gecko/20111010 Iceowl/1.0b2 Icedove/3.1.15

```Hello, all. When I was implementing raidz, I thought of a more
convenient, compact and faster representation of GF(256). Using it I
could make raid6_recover more compact and Reed-Solomon more fast. Later
is very important since we have to run Reed-Solomon check (but not
necessarily much slower recovery) on every boot. I expected 8-fold
speedup but got 16-18-fold one. The idea was the following:
- In standard representation of GF(256) it's fast to add 2 numbers: it's
just a XOR.
- We also know that every non-zero number in GF(256) is represented as a
power of X: a=X^s
So if we know that a=X^s and b=X^t then
c=ab=X^sX^t=X^(s+t)
Since there are only 255 such elements for each one of them we can
precompute s s.t. X^s=a and we can also precompute powers of X. By
choosing s in interval 0...254 we ensure that 0<=s+t<=508, so we need to
store only 508 elements and then we can multiply any 2 elements by first
checking that both are non-zero (and output 0 if any of them is), one
addition and 3 table lookups (which are probably in cache given their
tiny size) which is much faster than the loop with shifts.
Same idea is usable for GF(2^n) up to n=16. Currently in the code we
have GF(256), GF(2^32), GF(2^64) and GF(2^128).
GF(2^32) and GF(2^64) are used for CRC-32/CRC-64 where they are implicit
with all 256 values precomputed.
GF(2^128) is used in cryptodisk and ZFS-crypto. In cryptodisk it's used
in LRW and XTS. In LRW thanks to the trick from LRW we need only 2
multiplications per sector. XTS does only multiplications with X which
are fast (although the version in GRUB could be optimised by shifting
uint64 and not uint8 at time but it needs care to avoid endianness
problems).
ZFS-crypto in GCM on the other hand probably has the problem of slow
GF(2^128) multiplication. It's possible to optimise it somehow using
some precomputation but it hasn't been done.
Also if you notice any other ways to speed up the boot or make critical

--
Regards