bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: removing blank lines: "grep ." is really slow


From: Paolo Bonzini
Subject: Re: removing blank lines: "grep ." is really slow
Date: Fri, 16 Apr 2010 09:37:09 +0200
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.8) Gecko/20100301 Fedora/3.0.3-1.fc12 Lightning/1.0b2pre Thunderbird/3.0.3

On 04/16/2010 02:04 AM, Ivan wrote:
I used to use

grep .

for removing blank lines, until I realized how slow it is for large
numbers of lines. So I switched to

grep -v '^$'

, which is as fast as one would expect (well, not with the grep that
comes with MacOSX 10.5.8 (GNU grep version 2.5.1), but this seems to
have been fixed sometime between 2.5.1 and 2.6.3).

True. You'd need to expand UTF-8 period characters to the appropriate character sets, then you can use the faster single-byte character set matcher. It's on my todo list.

It wouldn't be exactly as fast as your grep -v solution (which is optimal and preferred) however, because it will check that a character in the line is a valid UTF-8 character. In particular it would be slow and have false negatives if you're document is not UTF-8.

You can also use "LC_ALL=C grep .", that would be fast and exactly equivalent to "grep -v '^$'".

Paolo




reply via email to

[Prev in Thread] Current Thread [Next in Thread]