coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] Speedup wc -l


From: Bernhard Voelker
Subject: Re: [PATCH] Speedup wc -l
Date: Fri, 20 Mar 2015 00:30:48 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.5.0

On 03/19/2015 11:18 AM, Pádraig Brady wrote:
> Yes 30 was a bit aggressive.
> I had already changed it to 15 (150) locally here after more testing.

Here I'm getting best results when using 100 as the limit, i.e., with
an average of 9 characters in the first 10 lines; 100 because of:
  (9 chars + 1 '\n') * 10 = 100

This is the script I was using (openSUSE-13.2, i5-4570 CPU, 20GB RAM):

  #!/bin/sh
  file='file.txt'
  # Test with various line lengths.
  for len in 0 1 2 5 6 7 8 9 10 11 15 20 50 100 1000; do
    printf "\n====== %s ======\n" "$len"
    # Prepare a 500M file with LEN characters per line.
    yes "$(seq -s '' 1000 | head -c $len)" \
      | src/dd iflag=fullblock status=none bs=1M count=500 of="$file"
    # Do 5 test runs with the old/new 'wc'.
    for i in $(seq 3); do
      # dummy run
      wc -l "$file" > /dev/null
      for wc in src/wc-before src/wc; do
        printf "\n= %s =\n" "$wc"
        time "$wc" -l "$file"
      done
    done
  done

You are of course encouraged to play with/reduce the list for 'len'
in that script.

Other than that, the patch looks great per se. I couldn't find a way
to make it faster than that.

One thing: do you think that the compiler will get better with
memchr() for short lines some day?  Then we should maybe add a
"FIXME: re-evaluate magic number (100 or 150) in 2017".

Finally: can we get some performance test results from some other
platforms (ARM, PPC, 32 vs 64 bit)? I mean, this patch helps on
Linux/x86_64, but if it makes things worse on other platforms, then
I don't think complicating the code is worth it.

Thanks & have a nice day,
Berny



reply via email to

[Prev in Thread] Current Thread [Next in Thread]