bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-gawk] Why strings extracted by match() can be considered as num


From: Andrew J. Schorr
Subject: Re: [bug-gawk] Why strings extracted by match() can be considered as numbers?
Date: Mon, 11 Jun 2018 13:54:35 -0400
User-agent: Mutt/1.5.21 (2010-09-15)

Hi,

On Mon, Jun 11, 2018 at 12:40:26PM -0500, Peng Yu wrote:
> The following example shows that strings extracted by match() can be
> considered as numbers. This automatic conversion is not natural to me.
> 
> $ cat main.sh
> #!/usr/bin/env bash
> # vim: set noexpandtab tabstop=2:
> 
> set -v
> seq 10 | awk -e 'BEGIN { match("8", /([0-9]+)/, a) } $1 > a[1] { print }'
> seq 10 | awk -e 'BEGIN { a[1] = "8" } $1 > a[1] { print }'
> $ ./main.sh
> seq 10 | awk -e 'BEGIN { match("8", /([0-9]+)/, a) } $1 > a[1] { print }'
> 9
> 10
> seq 10 | awk -e 'BEGIN { a[1] = "8" } $1 > a[1] { print }'
> 9
> 
> Based on the manpage, it seems that the results should only be
> considered as strings. Why there is such a discrepancy? Thanks.
> 
>        match(s, r [, a])       Return the position in s where the
> regular expression r  occurs,  or  zero  if  r  is  not
>                                present,  and  set  the values of
> RSTART and RLENGTH.  Note that the argument order is the
>                                same as for the ~ operator: str ~ re.
> If array a is provided, a is cleared and then  ele-
>                                ments 1 through n are filled with the
> portions of s that match the corresponding parenthe-
>                                sized subexpression in r.  The zero'th
> element of a contains the portion of s  matched  by
>                                the entire regular expression r.
> Subscripts a[n, "start"], and a[n, "length"] provide the
>                                starting index in the string and length
> respectively, of each matching substring.

This is documented in the info docs:

https://www.gnu.org/software/gawk/manual/html_node/Variable-Typing.html#Variable-Typing

   Fields, getline input, FILENAME, ARGV elements, ENVIRON elements, and the
   elements of an array created by match(), split(), and patsplit() that are
   numeric strings have the strnum attribute.34 Otherwise, they have the string
   attribute. Uninitialized variables also have the strnum attribute.

There is only so much that can fit in the man page. See also the POSIX awk spec
discussion of "numeric string" values:

http://pubs.opengroup.org/onlinepubs/9699919799/utilities/awk.html

   A string value shall be considered a numeric string if it comes from one of 
the following:
      1. Field variables
      2. Input from the getline() function
      3. FILENAME
      4. ARGV array elements
      5. ENVIRON array elements
      6. Array elements created by the split() function
      7. A command line variable assignment
      8. Variable assignment from another numeric string variable

The match function is a gawk extension, and the array values parsed using
match are treated the same way as those parsed using split. I hope you will
agree that this makes sense.

If you want to force a string value, you can concatenate with "":

bash-4.2$ seq 10 | awk -e 'BEGIN { match("8", /([0-9]+)/, a) } $1 > (a[1] "") { 
print }'
9

Regards,
Andy



reply via email to

[Prev in Thread] Current Thread [Next in Thread]