bug-guile
[Top][All Lists]

## bug#18295: Radix points in non-decimal numbers

 From: Mark H Weaver Subject: bug#18295: Radix points in non-decimal numbers Date: Wed, 01 Oct 2014 01:19:39 -0400 User-agent: Gnus/5.13 (Gnus v5.13) Emacs/24.3 (gnu/linux)

```Hi Ian,

> Occasionally it is handy to use a radix point in bases other than
> decimal. i.e. 1.1 in binary is 1.5 decimal.

I agree that this would be nice.

Here's a preliminary patch to implement it, against stable-2.0.  It
works, but is not yet ready to push.  It needs tests, but more
importantly I'm undecided on how best to limit the exponents.  The
numbers are represented exactly until just before returning from
string->number, so very large exponents could exhaust the available
memory.  Of course, a large integer can already exhaust the memory, but
that's less likely to happen by accident when exponents are not
involved.  Also, if they _are_ converted to inexact at the end (which
happens unless #e is given), large exponents will become infinite or
zero, which is not as nice as an error message, although I think we
already fail to do this in some cases.

One easy solution would be to prohibit exponents unless radix == 10.

Thoughts?

If you want to work more on this, we could be co-authors of the commit.
I should probably focus on other things for a while.

Best,
Mark

```
```>From 9ed731c917d4dd9d73258d96e401da5a06bef77a Mon Sep 17 00:00:00 2001
Date: Mon, 29 Sep 2014 01:26:50 -0400
Subject: [PATCH] PRELIMINARY string->number: Support digits after point for
non-decimals.

NOTE: This limits the radix to 36.  Previously, we would accept
bogosities such as:
(string->number "{" 37) => 36
(string->number "|" 38) => 37

TODO: Generalize code that places reasonable limits on exponents when
using non-decimal radices.  (search for XXX in the patch)

TODO: Split into multiple commits (use scm_t_wchar, avoid the word
"decimal" except where base 10 is assumed, doc fixes, limit to
radix <= 36, support digits after point for non-decimals)
---
doc/ref/api-data.texi         |  6 +--
libguile/numbers.c            | 94 +++++++++++++++++++++----------------------
test-suite/tests/numbers.test |  6 +--
3 files changed, 53 insertions(+), 53 deletions(-)

diff --git a/doc/ref/api-data.texi b/doc/ref/api-data.texi
index acdf9ca..e5880d2 100644
--- a/doc/ref/api-data.texi
+++ b/doc/ref/api-data.texi
@@ -1,7 +1,7 @@
@c -*-texinfo-*-
@c This is part of the GNU Guile Reference Manual.
2007,
address@hidden   2008, 2009, 2010, 2011, 2012, 2013, 2014  Free Software
Foundation, Inc.
@c See the file guile.texi for copying conditions.

@node Simple Data Types
@@ -1098,7 +1098,7 @@ inexact, a radix of 10 will be used.
@deffnx {C Function} scm_string_to_number (string, radix)
Return a number of the maximally precise representation
expressed by the given @var{string}. @var{radix} must be an
-exact integer, either 2, 8, 10, or 16. If supplied, @var{radix}
+exact integer between 2 and 36. If supplied, @var{radix}
is a default radix that may be overridden by an explicit radix
prefix in @var{string} (e.g.@: "#o177"). If @var{radix} is not
supplied, then the default radix is 10. If string is not a
diff --git a/libguile/numbers.c b/libguile/numbers.c
index c197eee..4748b51 100644
--- a/libguile/numbers.c
+++ b/libguile/numbers.c
@@ -1,6 +1,4 @@
-/* Copyright (C) 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003,
- *   2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013,
- *   2014 Free Software Foundation, Inc.
+/* Copyright (C) 1995-2014 Free Software Foundation, Inc.
*
* Portions Copyright 1990, 1991, 1992, 1993 by AT&T Bell Laboratories
* and Bellcore.  See scm_divide.
@@ -5806,7 +5804,7 @@ enum t_exactness {NO_EXACTNESS, INEXACT, EXACT};
/* Caller is responsible for checking that the return value is in range
for the given radix, which should be <= 36. */
static unsigned int
-char_decimal_value (scm_t_uint32 c)
+char_digit_value (scm_t_wchar c)
{
/* uc_decimal_value returns -1 on error. When cast to an unsigned int,
that's certainly above any valid decimal, so we take advantage of
@@ -5818,8 +5816,8 @@ char_decimal_value (scm_t_uint32 c)
if (d >= 10U)
{
c = uc_tolower (c);
-      if (c >= (scm_t_uint32) 'a')
-        d = c - (scm_t_uint32)'a' + 10U;
+      if (c >= (scm_t_wchar)'a')
+        d = c - (scm_t_wchar)'a' + 10U;
}
return d;
}
@@ -5837,14 +5835,14 @@ mem2uinteger (SCM mem, unsigned int *p_idx,
unsigned int digit_value;
SCM result;
-  char c;
+  scm_t_wchar c;
size_t len = scm_i_string_length (mem);

if (idx == len)
return SCM_BOOL_F;

c = scm_i_string_ref (mem, idx);
-  digit_value = char_decimal_value (c);
+  digit_value = char_digit_value (c);
return SCM_BOOL_F;

@@ -5862,9 +5860,9 @@ mem2uinteger (SCM mem, unsigned int *p_idx,
break;
else
{
-          digit_value = char_decimal_value (c);
-          /* This check catches non-decimals in addition to out-of-range
-             decimals.  */
+          digit_value = char_digit_value (c);
+          /* This check catches non-digits in addition to out-of-range
+             digits.  */
break;
}
@@ -5899,18 +5897,17 @@ mem2uinteger (SCM mem, unsigned int *p_idx,
}

-/* R5RS, section 7.1.1, lexical structure of numbers: <decimal 10>.  Only
- * covers the parts of the rules that start at a potential point.  The value
- * of the digits up to the point have been parsed by the caller and are given
- * in variable result.  The content of *p_exactness indicates, whether a hash
- * has already been seen in the digits before the point.
+/* R5RS, section 7.1.1, lexical structure of numbers: <decimal 10>.  We
+ * generalize this to support radices other than 10.  Only covers the
+ * parts of the rules that start at a potential point.  The value of the
+ * digits up to the point have been parsed by the caller and are given
+ * in variable result.  The content of *p_exactness indicates, whether a
+ * hash has already been seen in the digits before the point.
*/

-#define DIGIT2UINT(d) (uc_numeric_value(d).numerator)
-
static SCM
-mem2decimal_from_point (SCM result, SCM mem,
-                       unsigned int *p_idx, enum t_exactness *p_exactness)
+mem2real_from_point (SCM result, SCM mem, unsigned int *p_idx,
+                     unsigned int radix, enum t_exactness *p_exactness)
{
unsigned int idx = *p_idx;
enum t_exactness x = *p_exactness;
@@ -5930,13 +5927,12 @@ mem2decimal_from_point (SCM result, SCM mem,
while (idx != len)
{
scm_t_wchar c = scm_i_string_ref (mem, idx);
-         if (uc_is_property_decimal_digit ((scm_t_uint32) c))
+          digit_value = char_digit_value (c);
{
-             if (x == INEXACT)
-               return SCM_BOOL_F;
-             else
-               digit_value = DIGIT2UINT (c);
-           }
+              if (x == INEXACT)
+                return SCM_BOOL_F;
+            }
else if (c == '#')
{
x = INEXACT;
@@ -5946,20 +5942,20 @@ mem2decimal_from_point (SCM result, SCM mem,
break;

idx++;
-         if (SCM_MOST_POSITIVE_FIXNUM / 10 < shift)
+         if (SCM_MOST_POSITIVE_FIXNUM / radix < shift)
{
big_shift = scm_product (big_shift, SCM_I_MAKINUM (shift));
result = scm_product (result, SCM_I_MAKINUM (shift));
result = scm_sum (result, SCM_I_MAKINUM (add));

-             shift = 10;
}
else
{
-             shift = shift * 10;
+             shift = shift * radix;
}
};

@@ -5981,6 +5977,7 @@ mem2decimal_from_point (SCM result, SCM mem,
int sign = 1;
unsigned int start;
scm_t_wchar c;
+      unsigned int digit_value;
int exponent;
SCM e;

@@ -6020,24 +6017,29 @@ mem2decimal_from_point (SCM result, SCM mem,
else
sign = 1;

-         if (!uc_is_property_decimal_digit ((scm_t_uint32) c))
+          digit_value = char_digit_value (c);
return SCM_BOOL_F;

idx++;
-         exponent = DIGIT2UINT (c);
+         exponent = digit_value;
while (idx != len)
{
scm_t_wchar c = scm_i_string_ref (mem, idx);
-             if (uc_is_property_decimal_digit ((scm_t_uint32) c))
+              digit_value = char_digit_value (c);
{
idx++;
+                  /* XXX FIXME: This logic is not sufficient for
+                     non-decimal numbers */
if (exponent <= SCM_MAXEXP)
-                   exponent = exponent * 10 + DIGIT2UINT (c);
+                   exponent = exponent * radix + digit_value;
}
else
break;
}

+          /* XXX FIXME: This logic is not sufficient for non-decimal numbers */
if (exponent > ((sign == 1) ? SCM_MAXEXP : SCM_MAXEXP + DBL_DIG + 1))
{
size_t exp_len = idx - start;
@@ -6046,7 +6048,7 @@ mem2decimal_from_point (SCM result, SCM mem,
scm_out_of_range ("string->number", exp_num);
}

-         e = scm_integer_expt (SCM_I_MAKINUM (10), SCM_I_MAKINUM (exponent));
+         e = scm_integer_expt (SCM_I_MAKINUM (radix), SCM_I_MAKINUM
(exponent));
if (sign == 1)
result = scm_product (result, e);
else
@@ -6138,15 +6140,14 @@ mem2ureal (SCM mem, unsigned int *p_idx,

if (scm_i_string_ref (mem, idx) == '.')
{
-       return SCM_BOOL_F;
-      else if (idx + 1 == len)
+      if (idx + 1 == len)
return SCM_BOOL_F;
-      else if (!uc_is_property_decimal_digit ((scm_t_uint32) scm_i_string_ref
(mem, idx+1)))
+      else if (char_digit_value (scm_i_string_ref (mem, idx+1))
return SCM_BOOL_F;
else
-       result = mem2decimal_from_point (SCM_INUM0, mem,
-                                        p_idx, &implicit_x);
+       result = mem2real_from_point (SCM_INUM0, mem, p_idx,
}
else
{
@@ -6173,14 +6174,13 @@ mem2ureal (SCM mem, unsigned int *p_idx,
/* both are int/big here, I assume */
result = scm_i_make_ratio (uinteger, divisor);
}
-      else if (radix == 10)
+      else
{
-         result = mem2decimal_from_point (uinteger, mem, &idx, &implicit_x);
+         result = mem2real_from_point (uinteger, mem, &idx,
if (scm_is_false (result))
return SCM_BOOL_F;
}
-      else
-       result = uinteger;

*p_idx = idx;
}
@@ -6436,7 +6436,7 @@ SCM_DEFINE (scm_string_to_number, "string->number", 1, 1,
0,
"Return a number of the maximally precise representation\n"
"expressed by the given @var{string}. @var{radix} must be an\n"
-           "exact integer, either 2, 8, 10, or 16. If supplied, @var{radix}\n"
+           "exact integer between 2 and 36. If supplied, @var{radix}\n"
"is a default radix that may be overridden by an explicit radix\n"
"prefix in @var{string} (e.g. \"#o177\"). If @var{radix} is not\n"
"supplied, then the default radix is 10. If string is not a\n"
@@ -6451,7 +6451,7 @@ SCM_DEFINE (scm_string_to_number, "string->number", 1, 1,
0,
base = 10;
else
-    base = scm_to_unsigned_integer (radix, 2, INT_MAX);
+    base = scm_to_unsigned_integer (radix, 2, 36);

scm_remember_upto_here_1 (string);
diff --git a/test-suite/tests/numbers.test b/test-suite/tests/numbers.test
index 847f939..0acc3db 100644
--- a/test-suite/tests/numbers.test
+++ b/test-suite/tests/numbers.test
@@ -1,6 +1,6 @@
;;;; numbers.test --- tests guile's numbers     -*- scheme -*-
-;;;; Copyright (C) 2000, 2001, 2003, 2004, 2005, 2006, 2009, 2010, 2011,
-;;;;   2012, 2013 Free Software Foundation, Inc.
+;;;; Copyright (C) 2000, 2001, 2003-2006, 2009-2014
+;;;;   Free Software Foundation, Inc.
;;;;
;;;; This library is free software; you can redistribute it and/or
;;;; modify it under the terms of the GNU Lesser General Public
@@ -1575,7 +1575,7 @@
(for-each (lambda (x) (if (string->number x) (throw 'fail)))
'("" "q" "1q" "6+7iq" "8+9q" "10+11" "13+" "address@hidden"
"+25iq" "26i" "-q" "-iq" "i" "5#.0" "8/" "10#11" ".#" "."
-               "#o.2" "3.4q" "15.16e17q" "18.19e+q" ".q" ".17#18" "10q" "#b2"
+               "3.4q" "15.16e17q" "18.19e+q" ".q" ".17#18" "10q" "#b2"
"#b3" "#b4" "#b5" "#b6" "#b7" "#b8" "#b9" "#ba" "#bb" "#bc"
"#bd" "#be" "#bf" "#q" "#b#b1" "#o#o1" "#d#d1" "#x#x1" "#e#e1"
"#i#i1" "address@hidden" "3/0" "0/0" "4+3/0i" "4/0-3i" "2+0/0i"
--
1.8.4

```