[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

What does Emacs on w32 know that grep can't figure out?

From: Stephen J. Turnbull
Subject: What does Emacs on w32 know that grep can't figure out?
Date: Fri, 01 Oct 2010 13:00:02 +0900

Lennart Borgman writes:

 > However trying to search this file from a cmd prompt with (gnuwin32)
 > grep does not work.

No, it almost certainly won't.  grep is byte-oriented and doesn't know
anything about coding systems.  On Unix with a UTF-8-capable terminal
you would do something like

   iconv --from=UTF-16 --to=UTF-8 $FILE | grep some-string

I would think that either Cygwin or Windows provides a version of
iconv.  If not, changing the file to UTF-8 (instead of UTF-16) using
Emacs should make it grep'able.  In some cases grep may think this is
a binary file anyway; if so, use the --text switch to force grep to
treat the file as text.

 > And it does not work with cygwin grep either. They think it is a
 > binary file (even though I changed the line delimiter to unix
 > style).

The EOL delimiter is not a problem.  grep should ignore the presence
or absence of CR when checking for binary files.  The only time it is
likely to matter is if you are searching for a word at the end of the
line, in which case instead of "word$" you can use "word\015?$" or
something like that (if it matters, grep may be EOL-agnostic these

Now, of course they think a UTF-16-encoded file is a binary file.  It
almost certainly contains NUL bytes (because an ASCII or Latin-1
character will always have a trailing NUL in UTF-16LE).

 > What is going on? Is grep sometimes useless on w32 now, or? (How do we
 > handle that in Emacs?)

Emacs tries to guess what the encoding is if you don't specify it.  It
may guess wrong in certain cases, but it should be extremely accurate
in case of any Unicode format.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]