Date: Mon, 1 Jul 2024 05:56:02 -0500
From: Ed Morton<mortoneccc@comcast.net>
If we output 4 multi-byte characters as 10 bytes using:
$ echo '61F09F948DF09F948E62' | xxd -r -p > file1
$
and run the following gawk command on it we get the output shown:
$ LC_ALL=en_US.utf8 gawk '{print(length($0))}' file1
6
$
i.e. 6 instead of 4.
I cannot reproduce this with
GNU Awk 5.3.0, API 4.0, PMA Avon 8-g1, (GNU MPFR 4.1.0, GNU MP 6.2.1)
Copyright (C) 1989, 1991-2023 Free Software Foundation.
running on
Linux maintain0p.gnu.org 5.15.0-113-generic #123+11.0trisquel30 SMP Wed Jun
26 05:33:28 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
I get 4, as expected.
So I presume this is specific to the Cygwin port of Gawk, and suggest
to take this up with the maintainers of that port.