bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#12350: Composites identified as primes in factor.c (when HAVE_GMP)


From: Torbjorn Granlund
Subject: bug#12350: Composites identified as primes in factor.c (when HAVE_GMP)
Date: Thu, 06 Sep 2012 20:16:35 +0200
User-agent: Gnus/5.11 (Gnus v5.11) Emacs/22.3 (berkeley-unix)

I and Niels now would appreciate feedback on the new factor code.

We've put the entire little project in a tar file, which is attached.
The code is also available at <http://gmplib.org:8000/factoring/>.

Here is the README file:

NT factor  (Niels' and Torbjörn's factor, or New Technology factor)

This is a project for producing a decent 'factor' command for GNU.
The code was written by Torbjörn Granlund and Niels Möller in Aug-Sept
2012, but parts of the code is based on preexisting GMP code.

The old factor program could handle numbers < 2^64, unless GMP was
available at build time.  Without GMP, only trial division was used;
with GMP an old version of GMP's demos/factorize.c code was used,
which relies on Pollard rho for much better performance.  The old
factor program used probabilistic Miller-Rabin primes testing.

The new code can factor numbers < 2^127 and does not currently make
use of GMP, not even as an option.  It uses fast trial division code,
Pollard rho, and optionally SQUFOF.  It uses the Lucas primality test
instead of a probabilistic test.

The new code is several times faster then the old code, in particular
on 32-bit hardware.  On current 64-bit machines, it is between 3 times
and 10000 times faster for ranges of numbers; for 32-bit machines we
have seen 150,000 times improvement for some number range.  The
advantage for the new code is greater for larger numbers, matching
mathematical theory of algorithm efficiency.  (These numbers compare
the new code to the old GMP-less code; GMP-enabled old code is only
between 3 and 10 times slower.)

For smaller numbers, more than half the time is spent in I/O,
buffering, and conversions.  We have not tried to optimise these
parts, but instead kept them clean.


* We don't have any --help or --version options currently.

* Our packaging with separate Makefile, outseq.c and ChangeLog was
  useful during our development.  We don't expect these to be useful
  in coreutils.  In particular, the slow testing of the 'check' target
  is probably quite unsuitable for coreutils (but similar but quicker
  tests would make sense).

* The files probably needed for coreutils are:

  o factor.c -- main factoring code
  o make-prime-list.c -- primes table generator program
  o longlong.h -- arithmetic support routines (from GMP)


Technical considerations:

* Should we handle numbers >= 2^127?  That would in effect mean
  merging a current version of GMP's demos/factorize.c into this
  factor.c, and put that under HAVE_GMP (like in the old factor.c).
  It should be understood that factoring such large numbers with only
  Pollard rho is not very feasible.

* We think a --verbose option would be nice, although we don't have
  one in the present version.  It would output information on
  algorithm switches and bounds reached.  Opinions?


Portability caveats:

* We rely on POSIX.1 getchar_unlocked for a performance advantage.

* We have some hardwired W_TYPE_SIZE settings for the code interfacing
  to longlong.h.  It is now 64 bits.  It will break on systems where
  uintmax_t is not a 64-bit type.  Please see the beginning of
  factor.c.


Legal caveat:

* Both Niels and Torbjörn are GNU hackers since long.  We do not
  currently have paperwork in place for coreutils contributions.  This
  will certainly be addressed.


Attachment: nt-factor.tar.lz
Description: Binary data


-- 
Torbjörn

reply via email to

[Prev in Thread] Current Thread [Next in Thread]