[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH] Speedup wc -l
From: |
Bernhard Voelker |
Subject: |
Re: [PATCH] Speedup wc -l |
Date: |
Wed, 18 Mar 2015 18:24:20 +0100 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.5.0 |
On 03/18/2015 04:57 PM, Pádraig Brady wrote:
+ wc -l is now up to 6 times faster with short lines.
diff --git a/src/wc.c b/src/wc.c
index 8cb5163..8125100 100644
--- a/src/wc.c
+++ b/src/wc.c
@@ -264,6 +264,8 @@ wc (int fd, char const *file_x, struct fstatus *fstatus,
off_t current_pos)
{
/* Use a separate loop when counting only lines or lines and bytes --
but not chars or words. */
+ bool long_lines = false;
+ bool check_len = true;
while ((bytes_read = safe_read (fd, buf, BUFFER_SIZE)) > 0)
{
char *p = buf;
@@ -275,12 +277,39 @@ wc (int fd, char const *file_x, struct fstatus *fstatus,
off_t current_pos)
break;
}
+ char *end = p + bytes_read;
+ char *line_start = p;
+
+ /* Avoid function call overhead for shorter lines. */
+ if (check_len)
+ while (p != end)
+ {
+ lines += *p++ == '\n';
+ /* If there are more than 300 chars in the first 10 lines,
+ then use memchr, where system specific optimizations
+ may outweigh any function call overhead. */
+ if (lines <= 10)
+ {
+ if (p - line_start > 300)
+ {
+ long_lines = true;
+ break;
+ }
+ }
+ }
+ else if (! long_lines)
+ while (p != end)
+ lines += *p++ == '\n';
+
Doesn't this run into the memchr loop in both cases
(which the compiler seems to optimize away, but looks odd)?
+ /* memchr is more efficient with longer lines. */
while ((p = memchr (p, '\n', (buf + bytes_read) - p)))
{
++p;
++lines;
}
+
bytes += bytes_read;
+ check_len = false;
}
}
#if MB_LEN_MAX > 1
In my first tests I can second the speedup for short lines.
Where do you have the magic 30 from?
Tests here show that the effect reverses with line length
between 10 and ~27, i.e., the new version is ~80% slower.
Beyond that, both versions behave the same, as expected.
I have to test more this night.
Thanks & have a nice day,
Berny