[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Chicken-users] big prime number
From: |
Peter Bex |
Subject: |
Re: [Chicken-users] big prime number |
Date: |
Wed, 27 Jan 2016 22:06:37 +0100 |
User-agent: |
Mutt/1.5.21 (2010-09-15) |
On Mon, Jan 25, 2016 at 03:00:46PM +0100, Peter Bex wrote:
> Unfortunately, I just found out that something goes wrong when compiling
> a program containing bignum literals: The number data can get truncated,
> causing it to go from 22M to 2.1M which of course can be printed much
> quicker. So we cheered too soon.
>
> In this program, the compiler actually does most of the work because the
> value can be statically computed. Running it in the interpreter gives
> a much longer run time, like with the numbers egg under CHICKEN 4 :(
>
> I'll have to figure out why it's so slow, but also why CHICKEN 5 is
> messing up. It's a great test case though! :)
OK, here's an update with my findings, for those still interested.
The reason CHICKEN 5 stops with a small output is that the compiler
actually constant-folds the result of the (- (expt ...) 1) expression,
resulting in a HUGE bignum literal which will be compiled into the C
source. This literal is encoded as a hex string, preceded by its size,
encoded in 24 bits. Unfortunately, the hex string's size alone takes
up 27 bits!
The compiler should bail out when encountering a literal that's too large
to encode. Currently it just masks out the higher bits, effectively
silently truncating the value. This is easy to fix. I think it's
probably a good idea to also drop folded constants when they are too
large to encode, opting to evaluate the value at runtime, instead.
We could also choose a more efficient "packed" binary representation of
bignum literals, but that'll need even more special code that's only
needed for these extreme edge cases, and results in harder to debug code
in the normal case. Hex is a nice hackable internal representation for
literals in C code.
BTW, the reason we're not constant folding the result string (which we
could easily do) is that number->string needs to yield a fresh string
every time according to the spec. This _could_ perhaps be special-cased
by tagging freshly allocated folded constant, so we know at runtime that
we must copy the constant every time it's used. If we did that, compiling
this source file would take about 5 minutes, and running it would take
less than a second or so! :)
Having said all that, I'll explain the real reason the numbers
egg/CHICKEN 5 is so slow. When converting a number to a decimal
string, we use the "remainder tree" algorithm. This simply calculates
10^N, with N about half the number's size in digits. Then the number
is divided by 10^N, resulting in two halves (quotient and remainder).
These two halves are then individually converted to string recursively,
divide-and-conquer style.
There is only one know faster algorithm which is "scaled remainder tree",
but I don't grok this. If anyone does and wants to contribute some code
to make numbers use this algorithm, please be my guest! Anyway, even
with the current algorithm it should be much faster than it is.
The reason it's relatively slow is because the division (and the
calculation of 10^N itself) uses multiplication a lot. And we currently
only have the Karatsuba multiplication algorithm, which is suboptimal on
extremely large numbers. In such cases, an FFT-based multiplication
would perform better. GMP (thus, Racket, Guile and Haskell(?)) uses FFT
at these sizes, and so does Gambit.
Unfortunately, I don't grok the FFT itself, let alone multiplication
based on it (using floating-point numbers, which sounds like madness!).
The code we'd have to add for this is probably pretty large as well at
relatively low benefit for most practical use cases. All of this means
we probably won't get support for FFT multiplication any time soon
unless a kind soul is going to build it in a small bit of code.
Having said all that, I'm not sure how useful it is to even convert
such a huge number to decimal; it's a good test of bignum implementation
quality, I'll grant you that, but it doesn't really warrant more work on
the numbers egg for now, IMO. By the way, to make things look a little
less bleak try the following things, just for fun:
1) Converting to hex instead of decimal, and comparing those timings with
the other Schemes.
2) Comparing CHICKEN's performance on your program with Gauche,
MIT Scheme, Scheme48, Chibi Scheme, Ruby and Python. We're faster
than most things not based on GMP (except, of course(!) Gambit).
So, wrapping up: CHICKEN is doing pretty well at medium-to-large bignums,
but if you want to deal with truly astronomical numbers efficiently,
you'll probably have to go to a different implementation, or you can
wait/help out until we can compete at that level.
Cheers,
Peter
signature.asc
Description: Digital signature