bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-gawk] characters-as-bytes switch


From: Aharon Robbins
Subject: Re: [bug-gawk] characters-as-bytes switch
Date: Tue, 19 Jun 2012 21:28:02 +0300
User-agent: Heirloom mailx 12.4 7/29/08

Greetings. Re this:

> From: "SP" <address@hidden>
> To: <address@hidden>
> Date: Sun, 17 Jun 2012 00:55:16 +0200
> Subject: [bug-gawk] characters-as-bytes switch
>
> Hello,
>
> Sorry for my approximate english, I'm french ;-)
>
> Well, I've just installed the latest cygwin binaries under Windows 7, in
> order to have a gawk with "characters-as-bytes" switch. Unfortunately, this
> switch doesn't seem to act correctly within pattern. Here is a full log
> demonstrating the problem. Note that \xE2\x80\x93 is a valid UTF-8
> character, not \xE2\x80\x42, and note the period in the gensub pattern.
>
> ==========
>
> C:\>ver
> Microsoft Windows [Version 6.1.7601]
>
> C:\>gawk.exe --version
> GNU Awk 4.0.1
> ...
> blah blah
>
> C:\>gawk.exe 'BEGIN { print "\xE2\x80\x93"; exit }' | gawk.exe
> --characters-as-bytes "{ print gensub(/\xE2\x80./,""ZZZ"",""g"",$0)}" | od
> -c -t x1
>
> 0000000 342 200 223  \n
>          e2  80  93  0a
> 0000004
>
> C:\>gawk.exe 'BEGIN { print "\xE2\x80\x42"; exit }' | gawk.exe
> --characters-as-bytes "{ print gensub(/\xE2\x80./,""ZZZ"",""g"",$0)}" | od
> -c -t x1
>
> 0000000   Z   Z   Z  \n
>          5a  5a  5a  0a
> 0000004
>
> ==========
>
> If I inject a real UTF-8 char, /\xE2\x80./ doestn't match despite
> --characters-as-bytes. And if I inject an invalid UTF-8 char /\xE2\x80./
> matches.
>
> Thanks by advance for your help in circumvention and/or correction of this
> problem ! 
>
> St?phane

This was indeed a bug. Thank you for reporting it. I have just committed
the fix below.

Arnold
--------------------------
diff --git a/main.c b/main.c
index 3680e3f..2bb0b01 100644
--- a/main.c
+++ b/main.c
@@ -559,9 +559,12 @@ out:
 #if MBS_SUPPORT
        if (do_binary) {
                if (do_posix)
-                       warning(_("`--posix' overrides `--binary'"));
+                       warning(_("`--posix' overrides 
`--characters-as-bytes'"));
                else
                        gawk_mb_cur_max = 1;    /* hands off my data! */
+#if defined(LC_ALL)
+               setlocale(LC_ALL, "C");
+#endif
        }
 #endif
 



reply via email to

[Prev in Thread] Current Thread [Next in Thread]