[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [bug-gawk] Why strings extracted by match() can be considered as num
From: |
Andrew J. Schorr |
Subject: |
Re: [bug-gawk] Why strings extracted by match() can be considered as numbers? |
Date: |
Mon, 11 Jun 2018 13:54:35 -0400 |
User-agent: |
Mutt/1.5.21 (2010-09-15) |
Hi,
On Mon, Jun 11, 2018 at 12:40:26PM -0500, Peng Yu wrote:
> The following example shows that strings extracted by match() can be
> considered as numbers. This automatic conversion is not natural to me.
>
> $ cat main.sh
> #!/usr/bin/env bash
> # vim: set noexpandtab tabstop=2:
>
> set -v
> seq 10 | awk -e 'BEGIN { match("8", /([0-9]+)/, a) } $1 > a[1] { print }'
> seq 10 | awk -e 'BEGIN { a[1] = "8" } $1 > a[1] { print }'
> $ ./main.sh
> seq 10 | awk -e 'BEGIN { match("8", /([0-9]+)/, a) } $1 > a[1] { print }'
> 9
> 10
> seq 10 | awk -e 'BEGIN { a[1] = "8" } $1 > a[1] { print }'
> 9
>
> Based on the manpage, it seems that the results should only be
> considered as strings. Why there is such a discrepancy? Thanks.
>
> match(s, r [, a]) Return the position in s where the
> regular expression r occurs, or zero if r is not
> present, and set the values of
> RSTART and RLENGTH. Note that the argument order is the
> same as for the ~ operator: str ~ re.
> If array a is provided, a is cleared and then ele-
> ments 1 through n are filled with the
> portions of s that match the corresponding parenthe-
> sized subexpression in r. The zero'th
> element of a contains the portion of s matched by
> the entire regular expression r.
> Subscripts a[n, "start"], and a[n, "length"] provide the
> starting index in the string and length
> respectively, of each matching substring.
This is documented in the info docs:
https://www.gnu.org/software/gawk/manual/html_node/Variable-Typing.html#Variable-Typing
Fields, getline input, FILENAME, ARGV elements, ENVIRON elements, and the
elements of an array created by match(), split(), and patsplit() that are
numeric strings have the strnum attribute.34 Otherwise, they have the string
attribute. Uninitialized variables also have the strnum attribute.
There is only so much that can fit in the man page. See also the POSIX awk spec
discussion of "numeric string" values:
http://pubs.opengroup.org/onlinepubs/9699919799/utilities/awk.html
A string value shall be considered a numeric string if it comes from one of
the following:
1. Field variables
2. Input from the getline() function
3. FILENAME
4. ARGV array elements
5. ENVIRON array elements
6. Array elements created by the split() function
7. A command line variable assignment
8. Variable assignment from another numeric string variable
The match function is a gawk extension, and the array values parsed using
match are treated the same way as those parsed using split. I hope you will
agree that this makes sense.
If you want to force a string value, you can concatenate with "":
bash-4.2$ seq 10 | awk -e 'BEGIN { match("8", /([0-9]+)/, a) } $1 > (a[1] "") {
print }'
9
Regards,
Andy