Re: removing blank lines: "grep ." is really slow

bug-grep

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: removing blank lines: "grep ." is really slow

From:	Paul Eggert
Subject:	Re: removing blank lines: "grep ." is really slow
Date:	Fri, 23 Apr 2010 13:51:37 -0700
User-agent:	Gnus/5.13 (Gnus v5.13) Emacs/23.1 (gnu/linux)

Paolo Bonzini <address@hidden> writes:

> On 04/18/2010 06:32 AM, Ivan wrote:
>> So... right now, "." means "valid UTF-8 character"? Or not?
>
> Yes, if your locale is UTF-8.

Wouldn't it be better to model encoding errors as characters?  That is,
if grep sees a byte that cannot possibly be the start of a character, we
call it a "character" even though it is not in the standard Unicode
character set.  Internally, we could model it as (say) a negative
number, the negative of the byte value (so it would be in the range -255
.. -128).

Under this approach, the regular expression "." will match all nonempty
lines, which is what most users expect.  The current approach, where "."
matches only lines that contain at least one valid UTF-8 character, is
not nearly as useful or intuitive.

This modeling could be done consistently in both regular expressions and
in input.  It's very easy to explain: surely it's much easier than
whatever the current rules are.

[Prev in Thread]

Current Thread

[Next in Thread]

removing blank lines: "grep ." is really slow, Ivan, 2010/04/15
- Re: removing blank lines: "grep ." is really slow, Paolo Bonzini, 2010/04/16
  - Re: removing blank lines: "grep ." is really slow, Ivan, 2010/04/18
    - Re: removing blank lines: "grep ." is really slow, Paolo Bonzini, 2010/04/19
    - Re: removing blank lines: "grep ." is really slow, Paul Eggert <=
    - Re: removing blank lines: "grep ." is really slow, Paolo Bonzini, 2010/04/24

Prev by Date: Re: problem with make check
Next by Date: Re: removing blank lines: "grep ." is really slow
Previous by thread: Re: removing blank lines: "grep ." is really slow
Next by thread: Re: removing blank lines: "grep ." is really slow
Index(es):
- Date
- Thread