bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#31796: 27.1; dired-do-find-regexp-and-replace fails to find multilin


From: Dmitry Gutov
Subject: bug#31796: 27.1; dired-do-find-regexp-and-replace fails to find multiline regexps
Date: Mon, 30 Nov 2020 04:25:31 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.10.0

On 24.11.2020 22:16, Eli Zaretskii wrote:
Cc: abela@chalmers.se, 31796@debbugs.gnu.org
From: Dmitry Gutov <dgutov@yandex.ru>
Date: Tue, 24 Nov 2020 21:43:22 +0200

How about https://debbugs.gnu.org/cgi/bugreport.cgi?bug=31796#23 ?

The idea sounds fine to me.

Someone more familiar with existing ports of Grep on different systems
should weigh in on it.

I don't think it's necessary.  We just need to probe Grep for support
of these switches, and then use it.  The result cannot be worse than
it is now.

Now that I've dug in a little, the situation seems difficult.

-Pz does work, but it forces Grep to consider the file as one long string. As a consequence, if we ask it to output the line number, the number will always be 1. That's not a helpful mode of operation.

Even if it worked differently, -P imposes a significant performance penalty from what I see, even when the extra syntax is not actually used. So we couldn't enable it by default.

There is a similar program called pcregrep which outputs in the expected format:

$ pcregrep -MHn "names\"\n *" lisp/progmodes/project.el
lisp/progmodes/project.el:772: :type '(choice (const :tag "Read with completion from relative names"
                        project--read-file-cpd-relative)
lisp/progmodes/project.el:774: (const :tag "Read with completion from absolute names"
                        project--read-file-absolute)

...but it doesn't seem to have a way to reliably detect where a match result ends. When we're talking multiline, perhaps the searched file includes a string like "file-name/etc:number"? Some of our tests probably do. Grep has an flag -Z (or --null) which adds a null byte after file names, but pcregrep doesn't.

And anyway, pcregrep isn't usually installed by default.

ripgrep, OTOH, seems to combine both good features here:

$ rg -Hn --multiline --null "names\"\n *" lisp/progmodes/project.el
lisp/progmodes/project.el772: :type '(choice (const :tag "Read with completion from relative names"
773:                        project--read-file-cpd-relative)
774:                 (const :tag "Read with completion from absolute names"
775:                        project--read-file-absolute)

And it also disables the multiline mode automatically if the regexp can't match a newline (the multiline mode is significantly slower).

To sum up, there are options, but I don't see a working solution that is based on GNU Grep. And that's the most portable search program we have, I think.

The other recommendations I see (here: https://unix.stackexchange.com/questions/112132/how-can-i-grep-patterns-across-multiple-lines) include bespoke scripts in sed or perl in command mode. These seem less portable, but if someone would like to try their hand at one that would also output file names and line numbers in the expected format, I'd be happy to benchmark it.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]