emacs-bug-tracker
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[debbugs-tracker] bug#21558: closed (checking for a binary file is not d


From: GNU bug Tracking System
Subject: [debbugs-tracker] bug#21558: closed (checking for a binary file is not deterministic)
Date: Thu, 31 Dec 2015 03:26:02 +0000

Your message dated Wed, 30 Dec 2015 19:25:04 -0800
with message-id <address@hidden>
and subject line Re: grep BUG: text file is detected as binary
has caused the debbugs.gnu.org bug report #20526,
regarding checking for a binary file is not deterministic
to be marked as done.

(If you believe you have received this mail in error, please contact
address@hidden)


-- 
20526: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=20526
GNU Bug Tracking System
Contact address@hidden with problems
--- Begin Message --- Subject: checking for a binary file is not deterministic Date: Fri, 25 Sep 2015 11:11:06 +0200
Hi,

When piping a certain diff into grep-2.21, it sometimes thinks
it is a binary file, and sometimes treats it as text.  The latter
behaviour is expected and desired.  I think grep should never
consider standard input to be binary.

For lack of a simple recipe, here is the actual use case:

  wget 
http://http.debian.net/debian/pool/main/g/gtkorphan/gtkorphan_0.4.4.orig.tar.gz
  tar -xf gtkorphan_0.4.4.orig.tar.gz
  cd gtkorphan-0.4.4/
  mkdir fresh
  # the command rsync does not work at this location:
  for lang in pt_BR bg zh_CN hr cs da nl eo fi fr de hu id it lv pl ru sr sv 
vi;  do \
    wget http://translationproject.org/PO-files/$lang/gtkorphan-0.4.3.$lang.po 
-O fresh/$lang.po; \
  done

  diff -ur po fresh | /usr/local/bin/grep "Only in" | grep "fi"

That last command sometimes outputs:

  Only in fresh: fi.po
  Only in po: Makefile.in.in

and sometimes:

  Binary file (standard input) matches

(If you can't get the second output, try hitting Enter a few times
and then running the command again, and again, and again.  If you
still can't get both outputs, try using the en_US.utf8 locale.)


What seems to happening is that sometimes grep will look
far enough to see the diff between po/fr.po and fresh/fr.po
(which contains some ISO8859-1 codes), and sometimes
not.  When deleting fresh/bg.po and fresh/de.po, grep will
always see those codes and will always consider the input
to be binary.

I can of course use -a to force grep to see standard input
as text, but still... I think the determining whether a file
is text or binary should be deterministic: it should always
yield the same result when the input is the same.


$ /usr/local/bin/grep --version | head -1
/usr/local/bin/grep (GNU grep) 2.21

$ grep --version | head -1
grep (GNU grep) 2.21

$ diff --version | head -1
diff (GNU diffutils) 2.8.1

$ locale
LANG=eo.utf8
LANGUAGE=en
LC_CTYPE="eo.utf8"
LC_NUMERIC="eo.utf8"
LC_TIME="eo.utf8"
LC_COLLATE="eo.utf8"
LC_MONETARY="eo.utf8"
LC_MESSAGES="eo.utf8"
LC_PAPER="eo.utf8"
LC_NAME="eo.utf8"
LC_ADDRESS="eo.utf8"
LC_TELEPHONE="eo.utf8"
LC_MEASUREMENT="eo.utf8"
LC_IDENTIFICATION="eo.utf8"
LC_ALL=

Benno

-- 
http://www.fastmail.com - Accessible with your email software
                          or over the web




--- End Message ---
--- Begin Message --- Subject: Re: grep BUG: text file is detected as binary Date: Wed, 30 Dec 2015 19:25:04 -0800 User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.4.0 I installed into Savannah a patch (attached) that should fix this problem in typical cases, and am boldly marking the bug as done. Please give the fix a try if you have the time. Thanks.

Attachment: 0001-grep-be-less-picky-about-encoding-errors.patch
Description: Text Data


--- End Message ---

reply via email to

[Prev in Thread] Current Thread [Next in Thread]