bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: odd behavior of length(), match() and field splitting with multi-byt


From: Ed Morton
Subject: Re: odd behavior of length(), match() and field splitting with multi-byte characters
Date: Mon, 1 Jul 2024 07:29:15 -0500
User-agent: Mozilla Thunderbird

FWIW I also can't reproduce the issues with gawk 4.2.1 on Linux but can with gawk 5.0.0 on git bash on Windows (which is obviously cygwin-y if not identical).

Do the cygwin port maintainers read this mail archive or do I need to do something different to contact them?

    Ed.

On 7/1/2024 7:20 AM, Eli Zaretskii wrote:
Date: Mon, 1 Jul 2024 05:56:02 -0500
From: Ed Morton<mortoneccc@comcast.net>

          If we output 4 multi-byte characters as 10 bytes using:

              $ echo '61F09F948DF09F948E62' | xxd -r -p > file1
              $

          and run the following gawk command on it we get the output shown:

              $ LC_ALL=en_US.utf8 gawk '{print(length($0))}' file1
              6
              $

          i.e. 6 instead of 4.
I cannot reproduce this with

   GNU Awk 5.3.0, API 4.0, PMA Avon 8-g1, (GNU MPFR 4.1.0, GNU MP 6.2.1)
   Copyright (C) 1989, 1991-2023 Free Software Foundation.

running on

   Linux maintain0p.gnu.org 5.15.0-113-generic #123+11.0trisquel30 SMP Wed Jun 
26 05:33:28 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

I get 4, as expected.

So I presume this is specific to the Cygwin port of Gawk, and suggest
to take this up with the maintainers of that port.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]