emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: dired-do-find-regexp failure with latin-1 encoding


From: Dmitry Gutov
Subject: Re: dired-do-find-regexp failure with latin-1 encoding
Date: Sun, 29 Nov 2020 18:07:38 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.10.0

On 29.11.2020 17:06, Eli Zaretskii wrote:

Do we want to search the "binary" files at all?

We don't.  I still hope to understand why -a was needed in this case.
Stephen?

Looks like it actually depends on the encoding of the _output_. So if it can print some lines well but not others it can even print a line from a file and then later say it's a binary:

$ grep "prem" latin1.txt
premie?re is slightly different
Binary file latin1.txt matches

Adding -a or prepending 'LC_ALL=C' changes that:
$ LC_ALL=C grep "prem" latin1.txt
premi�re is first
premie?re is slightly different

So... looks like Grep searches through all files anyway. Just modifies its output in cases where it looks iffy.

We should support Grep regardless, since not everyone will have
ripgrep.  And in any case, "C-x RET c" will be needed with it as well,
no?

I'd have to test it explicitly to say for sure, but:

    ripgrep supports searching files in text encodings other than UTF-8,
    such as UTF-16, latin-1, GBK, EUC-JP, Shift_JIS and more. (Some
    support for automatically detecting UTF-16 is provided. Other text
    encodings must be specifically specified with the -E/--encoding flag.)

https://blog.burntsushi.net/ripgrep/#pitch

What is not clear to me is whether the _output_ is always in some
fixed encoding, like UTF-8.  That doesn't seem to be stated in the
docs there.

Judging by a small experiment, rg's output is in the same encoding as input, for each file. Which can be a nuisance when looking at the search results, but that's probably all.

In any case, if one takes the pre-processing route, the end encoding will be UTF-8.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]