Re: [bug-gawk] mystrtonum for any awk (Was: Handling hexadecimals in dif

bug-gawk

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-gawk] mystrtonum for any awk (Was: Handling hexadecimals in dif

From:	Jarno Suni
Subject:	Re: [bug-gawk] mystrtonum for any awk (Was: Handling hexadecimals in different modes)
Date:	Sun, 27 Sep 2015 13:25:14 +0300

On Sat, 26 Sep 2015 21:20:47 +0300
Aharon Robbins <address@hidden> wrote:

> > > Thanks for this code.
> > > 
> > > Are you willing:
> > > 
> > > 1. To sign paperwork putting this code into the public domain?
> >
> > How would signing paperwork happen?
> 
> I would ask Karl Berry to send you the paperwork to print out and
> mail in.

Sounds complicated, but I will, if it is necessary for the code to be 
included. 

> > Would the author be mentioned
> > anywhere in the manual? 
> 
> I would credit you on the page. :-)

Ok

> > I have to change the code a bit to make it return proper "+nan"+0 in
> > error cases. The test harness could be simplified. Or even removed,
> > what do you think?
> 
> Just comment it out. I can include it in the file without it having
> to appear in the manual.

In which file?

> > > 2. To write a prose description of how it works for the
> > > manual?
> >
> > Well, it depends. The code and comments tell how it works. I could
> > add few comments. I suppose I could do some kind of usage
> > instructions or manual for it, too. Is there something unclear in
> > how the converter works?
> 
> I haven't read the code yet.  But I'm looking for prose in the current
> style of the manual; the idea is to replace the existing code with
> yours.

I tuned the program a bit and added some comments; see below. I hope it
helps to understand the program. You may add a further description, if
you will. Please review the program.

> > BTW Perl is going to change its handling of octal numbers with
> > version 6:
> > http://design.perl6.org/S02.html#Radix_markers
> 
> Interesting, but not likely to influence anything I will do... :-)

Also many other programming languages recognize octals by 0o prefix:
https://en.wikipedia.org/wiki/Octal#In_computers
Though 0c would be more clear IMO, if upper case is allowed (0O vs.
0C). I used /^0[oO]/ in the program because traditionally awk allows
upper case and at least the lower case o seems to be a common practice
nowadays. (Perl does not support upper case in the prefix.)

#!/usr/bin/awk -f

# convert_from_base --- Generic function to convert string representing
# a natural number of given base to number. Start conversion from i'th
# character. Return the converted number, or nan, if the string is
# invalid for the given base.
function convert_from_base(base, str, i,   ret, n, p)
{
    n = length(str)
    if (i > n) return nan # expect at least one digit
    if ((ret=v[substr(str, i, 1)])!="" && ret<base) {
      while (i < n) {
        i++
        if ((p=v[substr(str, i, 1)])!="" && p<base)
          ret = ret*base + p
          else return nan
       }
       return ret
      } else return nan
}

# my_strtonum --- convert string to number using given base. If no base
# is given, detect base from prefix, or if not given, expect decimal
# number that may also be a floating point number; in other cases only
# string representing naturnal number is valid. Support prefixes "0b",
# "0o", "0d" and "0x" for binary, octal, decimal and hexadecimal
# numbers, respectively; case of the letter does not matter. Return the
# converted number, or if the base or the string is invalid, return a
# special value nan. Used awk's accuracy of arithmetic limits how big
# numbers can be converted accurately. If decimal separator is used,
# expect the same character as what command print uses as decimal
# separator.
function my_strtonum(str, base)
{
    if (base) {
        if (base < 2 || base > ld) {
         print "ERROR: base should be within [2," ld "]">"/dev/stderr"
         return nan 
        } 
        # expect natural number of given base
        return convert_from_base(base, str, 1)
    } else if (substr(str, 1, 1) == "0") {
        if (str ~ /^.[bB]/) {
            # expect natural binary
            return convert_from_base(2, str, 3)
        } else if (str ~ /^.[oO]/) {
            # expect natural octal
            return convert_from_base(8, str, 3)
        } else if (str ~ /^.[dD]/) {
            # expect natural decimal
            return convert_from_base(10, str, 3)
        } else if (str ~ /^.[xX]/) {
            # expect natural hexadecimal
            return convert_from_base(16, str, 3)
        }
    }

    if (str !~ rd) return nan;
    # valid decimal, possibly floating point
    return str + 0
}
BEGIN {
     # Define some global constants:
     nan="+nan"+0 # marks "Not a Number"
     digits="0123456789abcdefghijklmnopqrstuvwxyz"
     ld=length(digits) # maximum base (36)
     # Create a lookup table for values of digits:
     for(i=0; i<length(digits); i++) v[substr(digits,i+1,1)]=i
     # Upper case digits are equal to lower case:
     for(i=10; i<length(digits); i++) v[toupper(substr(digits,i+1,1))]=i
     d=substr(sprintf("%g",1.1),2,1); # d is the decimal separator
     # that may vary according to awk implementation, command line
     #options and used locale.
     rd="^[-+]?([0-9]+\\" d "?|\\" d "[0-9])[0-9]*([eE][-+]?[0-9]+)?$"
     # rd is regular expression to match decimal floating point number.


     # test harness
     #a[0]="-.1"
     #a[1]="25"
     #a[2]=".31"
     #a[3]="0123"
     #a[4]="0xdeadBEEF"
     #a[5]="123.45"
     #a[6]="1.e3"
     #a[7]="1.32"
     #a[8]="1.32E2"
     #a[9]=".e2"
     #a[10]="3.9e-2"
     #a[11]="1e5"
     #a[12]=""
     #a[13]="1,123"
     #a[14]="awk"
     #a[15]="1 000.4"
     #a[16]=".3e-2"
     #a[17]="-"
     #a[18]="."
     #a[19]="+."
     #a[20]="deadBEEF"
     #a[21]="deadbeef"
     #a[22]="oajlaselkjZ"
     #a[23]="0xdead"
     #a[24]=",1"
     #a[25]="0,23"
     #a[26]="3e-2"
     #a[27]="070"
     #a[28]="1.2a"
     #a[29]="1,2a"
     #a[30]="01e1"
     #a[31]="ö"
     #a[32]="0b101"
     #a[33]="0o76"
     #a[34]="0d96"
     #a[35]="0Xf"

     #for (i=0; i in a; i++) {
        #printf "\"%s\": %g, \"%s\", %d, \"%s\", add 1: \"%s\"\n", 
            #a[i],
            #my_strtonum(a[i]), my_strtonum(a[i]),
            #my_strtonum(a[i],ld), my_strtonum(a[i],ld),
            #my_strtonum(a[i],ld)+1
         ##print strtonum(a[i]), strtonum(a[i]"") # works only by gawk
      #}
}

Regards,

Jarno

-- 
Jarno Ilari Suni - http://www.iki.fi/8/

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [bug-gawk] Handling hexadecimals in different modes, Jarno Suni, 2015/09/01
- Re: [bug-gawk] Handling hexadecimals in different modes, Jarno Suni, 2015/09/06
  - Re: [bug-gawk] Handling hexadecimals in different modes, Jarno Suni, 2015/09/07
  - Re: [bug-gawk] Handling hexadecimals in different modes, arnold, 2015/09/08
    - Re: [bug-gawk] Handling hexadecimals in different modes, Jarno Suni, 2015/09/09
    - Re: [bug-gawk] mystrtonum for any awk (Was: Handling hexadecimals in different modes), Jarno Suni, 2015/09/22
    - Re: [bug-gawk] mystrtonum for any awk (Was: Handling hexadecimals in different modes), Jarno Suni, 2015/09/22
    - Message not available
    - Re: [bug-gawk] mystrtonum for any awk (Was: Handling hexadecimals in different modes), Jarno Suni, 2015/09/25
    - Message not available
    - Re: [bug-gawk] mystrtonum for any awk (Was: Handling hexadecimals in different modes), Jarno Suni <=
    - Re: [bug-gawk] mystrtonum for any awk (Was: Handling hexadecimals in different modes), Jarno Suni, 2015/09/30
    - Re: [bug-gawk] Handling hexadecimals in different modes, Jarno Suni, 2015/09/14
    - Re: [bug-gawk] Handling hexadecimals in different modes, Aharon Robbins, 2015/09/15

Prev by Date: Re: [bug-gawk] posix flag bug?
Next by Date: Re: [bug-gawk] mystrtonum for any awk (Was: Handling hexadecimals in different modes)
Previous by thread: Re: [bug-gawk] mystrtonum for any awk (Was: Handling hexadecimals in different modes)
Next by thread: Re: [bug-gawk] mystrtonum for any awk (Was: Handling hexadecimals in different modes)
Index(es):
- Date
- Thread