[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: gawk: Wrong behavior in binary mode

From: Eli Zaretskii
Subject: Re: gawk: Wrong behavior in binary mode
Date: Thu, 11 Dec 2008 18:06:36 -0500

> Date: Thu, 11 Dec 2008 05:39:23 +0200
> From: Aharon Robbins <address@hidden>
> Cc: address@hidden
> Greetings. Re this:
> > Date: Mon, 8 Dec 2008 23:27:51 -0200
> > From: "Carlos G." <address@hidden>
> > To: address@hidden
> > Subject: gawk: Wrong behavior in binary mode
> >
> > Hi... I think this is a bug.
> > When working with gawk in binary mode, the length() and index() built-ins
> > fail with character codes greater than 127(0x7f). For example:
> >
> > ....
> First, thank you very much for the bug report.
> Second, it's not a BINMODE problem; rather it is a problem with locales;
> the same behavior shows up under Linux which ignores BINMODE.

I actually think that Carlos is right: if the user says she wants the
bytes treated as bytes, Gawk should not try to treat them as multibyte
character strings.

I think the patch you posted in a followup is only partially correct:
it will only work if the stream of bytes is not a valid multibyte
string.  But what if by chance it is a valid string?  Solving this as
you did gives unpredictable results, from the point of view of a user
who does not necessarily know everything about valid and invalid
multibyte strings.

So I think there should be a way to tell Gawk "hands off my bytes!"
BINMODE could be just that way (in which case Linux should not ignore
it), or you can introduce a new variable.

Btw, we've been through these issues in Emacs when Emacs 20 introduced
multi-lingual support, and Emacs now has a special way of treating raw
bytes that don't represent multibyte (or otherwise encoded) text.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]