help-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Negative occur


From: Ted Zlatanov
Subject: Re: Negative occur
Date: Thu, 29 Nov 2007 09:58:11 -0600
User-agent: Gnus/5.110007 (No Gnus v0.7) Emacs/22.1 (gnu/linux)

On Wed, 28 Nov 2007 14:52:15 -0800 "Drew Adams" <drew.adams@oracle.com> wrote: 

>> >> > You could try running "occur" with the pattern "^" (which matches
>> >> > every line), then prune the results with M-x delete-matching-lines
DA> RET

DA> [spamfilteraccount suggested that Emacs should have this as part of
DA> `occur'...]

DA> I realize that your suggestion is that this be added to
DA> Emacs. I agree. FYI - In Icicles, just do this: C-' foobar C-~
DA> That shows and lets you visit all lines that do not match the
DA> regexp "foobar".
>> 
>> Both solutions will be slower on a large buffer than they should be.

DA> What does "slower than they should be" mean? How slow should they be? How
DA> slow are they in fact? How large is a large buffer? How do you judge that
DA> "they" (two totally different approaches and implementations) are slower
DA> than they should be?

I'm certain that creating an *occur* buffer on every line of a 100+ MB
buffer and then removing most of them compares poorly in memory usage
and CPU usage to just matching what you need from it.  It's a very
suboptimal approach whose only advantage is that it doesn't require
changes to any internal logic.  A parallel would be (using `sort'
instead of `cat' to account for Emacs' memory usage):

sort file | grep x
grep x file | sort

>> A real inversion parameter, either as a predicate function or a variable,
>> passed lexically or as a parameter to the occur-engine function call, is
>> necessary.

DA> Necessary? For what? Why necessary? These are generalizations that don't
DA> help.

Necessary to implement the solution in such a way that it will satisfy
both the OP and future needs for tuning the occur results.  I'm not
talking about Icicles (that's why I mentioned occur-1 and occur-engine
originally), sorry if I didn't state that clearly.  I just thought that
since you recommended the filter-later approach, Icicles didn't support
predicates, so it made sense to follow up to you.

DA> Your statements are vague, but I'm guessing that what you're really trying
DA> to say is that it is often more efficient to apply a predicate earlier
DA> rather than later (filter promotion), which is true.

Sure.  Reduce the search results as early as possible, as in my earlier
example of sort/grep usage.

DA> The Icicles approach is designed for interactive use, which is why it
DA> emphasises changing search patterns (and predicates) on the fly. It works
DA> fine with any buffers I've ever used, some of which are pretty darn big.
DA> (How big is big? I just searched a 19MB buffer with no effect on
DA> interactivity.)

DA> As always, the usefulness of a tool depends on what you use it for. If you
DA> want to search a 5 terabyte file, then interactivity might suffer with some
DA> approaches (depending on your hardware... and, especially, depending on your
DA> regexp). But, as always, the devil is in the details.

I can see that between a O(n) and O(n log(n)) algorithm for small data
sets, but when the difference is that one approach copies every line and
the other doesn't, while they achieve the same result, it literally
bothers me to recommend the former approach just because the API doesn't
support the latter.  So I'll propose the API change to emacs-devel.

As for hardware, I maintain an Emacs Maemo port, which is for the Nokia
770/800/810 tablets that run GNU/Linux.  There is little memory
available and the CPU is slow, so copying a large buffer unnecessarily
would be terrible for the user experience.

Ted


reply via email to

[Prev in Thread] Current Thread [Next in Thread]