emacs-bug-tracker
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[debbugs-tracker] bug#19388: closed (grep 2.21-1 identifies iso encoded


From: GNU bug Tracking System
Subject: [debbugs-tracker] bug#19388: closed (grep 2.21-1 identifies iso encoded text files as binary)
Date: Tue, 16 Dec 2014 07:13:02 +0000

Your message dated Mon, 15 Dec 2014 23:12:10 -0800
with message-id <address@hidden>
and subject line Re: bug#19388: grep 2.21-1 identifies iso encoded text files 
as binary
has caused the debbugs.gnu.org bug report #19388,
regarding grep 2.21-1 identifies iso encoded text files as binary
to be marked as done.

(If you believe you have received this mail in error, please contact
address@hidden)


-- 
19388: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=19388
GNU Bug Tracking System
Contact address@hidden with problems
--- Begin Message --- Subject: grep 2.21-1 identifies iso encoded text files as binary Date: Mon, 15 Dec 2014 15:22:00 +0100 User-agent: Mutt/1.5.23 (2014-03-12)
Hi,

I noticed that grep 2.21-1 regards ISO-8859-15 encoded files as binary, if
LC_ALL is set to en_US.UTF.

I am not sure if this is a bug or an expected behaviour change in 2.21-1, but
since I could not find anything in the changelog that directly mentions it, I am
reporting it. (I could not find anything on http://debbugs.gnu.org)

How to reproduce:

Create a iso-8859-15 encoded test file with: test ä ö ü

export LC_ALL=en_US.UTF8

grep test testfile

Binary file test matches

export LC_ALL=en_US

(grep works as expected)

The behaviour for LC_ALL=en_US.UTF8 was changed in 2.21-1 and worked correctly
in 2.20-1.

I am testing this on arch with glibc 2.20-4 (if that is relevant).

Please let me know if you need more informations.

Regards,

    Martin

--
Martin Hoch                        Friedrich-Bergius-Ring 15
fidion GmbH                                   97076 Würzburg



--- End Message ---
--- Begin Message --- Subject: Re: bug#19388: grep 2.21-1 identifies iso encoded text files as binary Date: Mon, 15 Dec 2014 23:12:10 -0800 User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.3.0
Martin Hoch wrote:
I noticed that grep 2.21-1 regards ISO-8859-15 encoded files as binary, if
LC_ALL is set to en_US.UTF.

I am not sure if this is a bug or an expected behaviour change in 2.21-1

It's an expected change.  Although this was documented in NEWS:

  If a file contains data improperly encoded for the current locale,
  and this is discovered before any of the file's contents are output,
  grep now treats the file as binary.

the grep manual is not so clear about it. I installed the attached patch to try to fix that.

Attachment: 0001-doc-document-binary-data-heuristic-better.patch
Description: Text Data


--- End Message ---

reply via email to

[Prev in Thread] Current Thread [Next in Thread]