bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: removing blank lines: "grep ." is really slow


From: Ivan
Subject: Re: removing blank lines: "grep ." is really slow
Date: Sun, 18 Apr 2010 00:32:58 -0400

On Apr 16, 2010, at 3:37 AM, Paolo Bonzini wrote:

True. You'd need to expand UTF-8 period characters to the appropriate character sets, then you can use the faster single-byte character set matcher. It's on my todo list.

It wouldn't be exactly as fast as your grep -v solution (which is optimal and preferred) however, because it will check that a character in the line is a valid UTF-8 character. In particular it would be slow and have false negatives if you're document is not UTF-8.

So... right now, "." means "valid UTF-8 character"? Or not? I'm a little confused about the difference between the current behavior and the behavior after you accomplish your todo list.

Anyway, I sent my original email because I couldn't think of any non- buggy reason for "grep ." to take an entire millisecond per line. That seems insanely slow even if some kind of UTF-8 checking is taking place. Here are some tests showing the non-linearity that I mentioned before:

bash$ time yes | head -n 1000 | grep . >/dev/null

real    0m0.311s
user    0m0.224s
sys     0m0.028s

bash$ time yes | head -n 5000 | grep . >/dev/null

real    0m3.730s
user    0m3.125s
sys     0m0.269s

bash$ time yes | head -n 10000 | grep . >/dev/null

real    0m10.282s
user    0m8.646s
sys     0m0.732s

bash$ time yes | head -n 20000 | grep . >/dev/null

real    0m21.156s
user    0m17.883s
sys     0m1.524s

I'm also puzzled by this:

bash$ time yes | head -n 5000 | grep '[a-b]' >/dev/null

real    0m0.159s
user    0m0.053s
sys     0m0.028s

bash$ time yes | head -n 5000 | grep '[y-z]' >/dev/null

real    0m3.755s
user    0m3.089s
sys     0m0.262s

bash$ time yes | head -n 5000 | grep '[yz]' >/dev/null

real    0m0.168s
user    0m0.021s
sys     0m0.028s

Are these behaviors expected?

Ivan





reply via email to

[Prev in Thread] Current Thread [Next in Thread]