[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Bug fix for $((x**y)) algorithm on 64+ bits machines.
From: |
Nicolas ARGYROU |
Subject: |
Re: Bug fix for $((x**y)) algorithm on 64+ bits machines. |
Date: |
Mon, 19 Sep 2011 13:48:01 -0700 (PDT) |
Hello,
I noticed other shells have the same bug, and perhaps you would like to use
this code in other GNU projects (like making a library call or an executable).
The best, I think, is to transfer the copyright to bash maintainers. You can
now copyright
and license it the way you want:
template<typename T>
inline T ipow(register /* unsigned */ T x, register /* unsigned */ T y)
{
if (x == 0 && y != 0) return 0;
// 1: ipow(x,y) = x ** y = Product [i=0; i<=log2(y)] (x **
(((y>>i)&1)*2**i))
// 2: x**(2**i) = x**(2**(i-1)) * x**(2**(i-1))
register T xy = 1;
for(; y; y>>=1, x *= x)
if (y & 1)
xy *= x;
return xy;
}
If you don't wish to use this algoritm, then there is another way of getting
around the bug:
for exemple, on 64 bit systems, when performing x ** y, if y > 64 then the
result
will be too large anyway to stand in 64 bits. It is true when you suppose x = 2,
and (even more) when x > 2. In this case you may then throw an overflow error.
In other cases, y < 64 is a manageable number for the loop algorithm.
But, I still prefer the O(log(y)) algorithm, which always computes the exact
value (modulus
2**64) in less numbers of steps on average.
This is, in detail, how the algorithm works:
When, for example, computing 3 ** 3 = 3 * 3 * 3 = 27,
you can write it also with a binary power: 3 ** (1*2**1 + 1*2**0),
and because, x ** (a+b) = x**a * x**b,
you may develop it: 3 ** (2**1) * 3 ** (2**0)
This gives the first equation in the comments of the code:
// 1: ipow(x,y) = x ** y = Product [i=0; i<=log2(y)] (x **
(((y>>i)&1)*2**i))
The second thing to notice is that we need to compute: 3**2**i,
but as 2**i = 2*2**(i-1) = 2**(i-1) + 2**(i-1),
then 3**2**i = 3**(2**(i-1) + 2**(i-1)),
since, x ** (a+b) = x**a * x**b,
then 3**2**i = 3**(2**(i-1) + 2**(i-1)) = 3**(2**(i-1)) * 3**(2**(i-1)).
This gives the second equation in the comments of the code:
// 2: x**(2**i) = x**(2**(i-1)) * x**(2**(i-1))
and a reccurence to compute x**2**(i+1) when we know x**2**i, from x**2**0 = x.
We just then have to roll though the bits of y, computing x = x * x each step,
and
multiplying xy (the temporary result) by one of the squares of x when the
corresponding bit
of y is set.
As we always do multiplications modulus 2**64, it gives the result modulus
2**64.
We only need to use 3 registers: one to roll y, one to hold the squares of x,
and one to hold
the result.
Writing the same algoritm with assembler language can save a few instructions
per loop,
because gcc doesn't catch that it can use the out-going bit of y to directly
jump over xy *= x;
if not set. But it won't be portable then.
Best regards,
Nicolas Argyrou