bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: tests/yesno.sh, kinds of line (matching/non-matching, selected/rejec


From: Charles Levert
Subject: Re: tests/yesno.sh, kinds of line (matching/non-matching, selected/rejected)
Date: Sun, 13 Nov 2005 20:43:44 -0500
User-agent: Mutt/1.4.1i

* On Sunday 2005-11-13 at 23:46:09 +0000, Julian Foad wrote:
> Charles Levert wrote:
> >As for -C/-v/-o and the warning issue, I think
> >we do disagree.  With -o, I see a line-group
> >(including selected and context lines) as a
> >pool of _printable_ content, i.e., lines to scan
> >for -o parts.  Hence, I see no special case or
> >need for a warning then, and I see the group
> >separator remaining.   -C/-v/-o means "find all
> >matching parts in the vicinity of a non-matching
> >line (or lines) and collect them as a group".
> 
> Yikes.  There's some kind of logic to that line of reasoning, but it 
> strikes me as very arcane.  I have never thought of wanting such output.  I 
> can't imagine any meaningful use for it.  Sorry.  That logic involves Grep 
> doing two different kinds of search (conceptually, not just 
> implementation-wise which it currently does anyway).

Implementation-wise, it's no more, no less than
what -o introduced to begin with, -v or no -v.

> That effect is much 
> better and more flexibly obtained with two successive Greps: "grep -C -v | 
> grep -o".

But loosing the ability to keep any line prefix
from the first one.

> I think "-v" and "-o" are mutually exclusive:
> 
>   -v: "show only the lines that have no matches"
>       (possibly with a few lines of context)
> 
>   -o: "show only the matches"
>       (it would make some kind of logical sense to show possibly a few 
> _characters_ of context around each match, but nobody has asked for that)

That last thing would introduce more
implementation code.  (There _is_ a simple way
to implement it:  define "a few" to be exactly
the number of characters there are on both
sides of the matching part on the line in which
it appears.)

The same thing could be proposed with words
of context, but the collaboration asked of
the matchers would be even greater.  (But see
parenthesis above.)


> I'm greatly in favour of options having orthogonal meanings as much as 
> possible, but I it is unreasonable here.  It is far better to disallow 
> certain combinations (which can then be given meanings in the future if it 
> becomes desirable) than to give them a meaning now just because we can (and 
> then have to support that meaning forever even though it is not useful).
> 
> Hey, there's another example of keeping it free of unneeded functionality 
> and future-proof: we don't support context lines with "-o".

Not quite.  Previously, we only thought
of considering the non -v cases.  For them,
whether or not we say we support context lines
with -o is just rhetoric, because the output
is the same since the context (non-matching)
lines don't provide any matching parts for -o.
Setting out_before=out_after=0 in main() was
more an optimization and a clarification (where
we forgot about the -v case).

However, if by "context lines with -o" you rather
meant something like

   filename-full context line
   filename-full context line
   filename:matching part only
   filename-full context line
   filename-full context line

then that's also a very twisted mix.

> If we were to 
> make "-C" work with "-o" in some arbitrary line-based way now, we'd lock 
> ourselves out of the possibility of making it show the specified number of 
> _characters_ 

I'm not sure what you mean here by line-based
way, or what you mean next by showing a number
of characters.  By number, do you mean an offset
or a count (within a match)?

Also, which kind of characters is it?
As in multi-octet characters counted as one?
That's a nice can of worms to open, because in
Unicode you can count

   U+0065, U+0301

as two characters, or

   <U+0065, U+0301>

as one combining character sequence.  I'm not
a big fan of anything else than octet offsets
(equivalently machine-byte pointers) in practice,
since in order to use back some character count,
you need to scan the whole thing from the start
anyway.  We already have that problem with lines
of uneven length; GNU Emacs didn't use to bother
about lines and did everything with one-octet
characters; does it still internally?

Again, the same could be proposed about counting
words when -w is specified.

Also, by the multi-process logic, assuming it's
a count, this could be a job for one of

   grep -o | wc -c
   grep -o | wc -m
   grep -o | wc -w

> if and when people do start to expect that.  (I'd never 
> thought of that "characters of context" idea until now, and I don't know if 
> it will ever be wanted, and don't suggest we even consider implementing it, 
> but it illustrates a point.)

Understood.

I'm not even sure I would have introduced -o in
the first place, since grep is by tradition one
of the Unix line-oriented tools.  We wouldn't
be having this discussion then.

Come to think of it, I don't care that much
about -o in general.  It's more the challenge
of the logical problem that attracted me
towards the implementation of an orthogonal
and unambiguously specified solution.  Beauty,
in math as in everything, is very much in the
eye of the beholder.  Useful in practice?


> >>That means those "4/..." tests are wrong.
> >
> >Since yesno.sh specializes in exactly this
> >kind of issues, should we now just remove those
> >from foad1.sh?  Or do still they do something
> >for anchors that nothing else does?
> 
> Their original purpose, as I mentioned, was to check that Grep doesn't 
> crash.

Let's just keep them and adjust their expected
output, then.  That's the simpler thing to do.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]