[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [bug-gawk] Why strings extracted by match() can be considered as num
From: |
arnold |
Subject: |
Re: [bug-gawk] Why strings extracted by match() can be considered as numbers? |
Date: |
Mon, 11 Jun 2018 12:13:33 -0600 |
User-agent: |
Heirloom mailx 12.4 7/29/08 |
Andy's answer is on target.
Thanks,
Arnold
"Andrew J. Schorr" <address@hidden> wrote:
> Hi,
>
> On Mon, Jun 11, 2018 at 12:40:26PM -0500, Peng Yu wrote:
> > The following example shows that strings extracted by match() can be
> > considered as numbers. This automatic conversion is not natural to me.
> >
> > $ cat main.sh
> > #!/usr/bin/env bash
> > # vim: set noexpandtab tabstop=2:
> >
> > set -v
> > seq 10 | awk -e 'BEGIN { match("8", /([0-9]+)/, a) } $1 > a[1] { print }'
> > seq 10 | awk -e 'BEGIN { a[1] = "8" } $1 > a[1] { print }'
> > $ ./main.sh
> > seq 10 | awk -e 'BEGIN { match("8", /([0-9]+)/, a) } $1 > a[1] { print }'
> > 9
> > 10
> > seq 10 | awk -e 'BEGIN { a[1] = "8" } $1 > a[1] { print }'
> > 9
> >
> > Based on the manpage, it seems that the results should only be
> > considered as strings. Why there is such a discrepancy? Thanks.
> >
> > match(s, r [, a]) Return the position in s where the
> > regular expression r occurs, or zero if r is not
> > present, and set the values of
> > RSTART and RLENGTH. Note that the argument order is the
> > same as for the ~ operator: str ~ re.
> > If array a is provided, a is cleared and then ele-
> > ments 1 through n are filled with the
> > portions of s that match the corresponding parenthe-
> > sized subexpression in r. The zero'th
> > element of a contains the portion of s matched by
> > the entire regular expression r.
> > Subscripts a[n, "start"], and a[n, "length"] provide the
> > starting index in the string and length
> > respectively, of each matching substring.
>
> This is documented in the info docs:
>
> https://www.gnu.org/software/gawk/manual/html_node/Variable-Typing.html#Variable-Typing
>
> Fields, getline input, FILENAME, ARGV elements, ENVIRON elements, and the
> elements of an array created by match(), split(), and patsplit() that are
> numeric strings have the strnum attribute.34 Otherwise, they have the
> string
> attribute. Uninitialized variables also have the strnum attribute.
>
> There is only so much that can fit in the man page. See also the POSIX awk
> spec
> discussion of "numeric string" values:
>
> http://pubs.opengroup.org/onlinepubs/9699919799/utilities/awk.html
>
> A string value shall be considered a numeric string if it comes from one
> of the following:
> 1. Field variables
> 2. Input from the getline() function
> 3. FILENAME
> 4. ARGV array elements
> 5. ENVIRON array elements
> 6. Array elements created by the split() function
> 7. A command line variable assignment
> 8. Variable assignment from another numeric string variable
>
> The match function is a gawk extension, and the array values parsed using
> match are treated the same way as those parsed using split. I hope you will
> agree that this makes sense.
>
> If you want to force a string value, you can concatenate with "":
>
> bash-4.2$ seq 10 | awk -e 'BEGIN { match("8", /([0-9]+)/, a) } $1 > (a[1] "")
> { print }'
> 9
>
> Regards,
> Andy