help-bash
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Counting words, fast!


From: Jesse Hathaway
Subject: Re: Counting words, fast!
Date: Wed, 17 Mar 2021 10:34:47 -0500

On Tue, Mar 16, 2021 at 10:30 PM Dennis Williamson
<dennistwilliamson@gmail.com> wrote:
> I've been playing with your optimized code changing the read to grab data in 
> chunks like some of the other optimized code does - thus extending your move 
> from by-word to by-line reading to reading a specified larger number of 
> characters.
>
> IFS= read -r -N 4096 var
>
> And appending the result of a regular read to end at a newline. This seemed 
> to cut about 20% off the time. But I get different counts than your code. 
> I've tried using read without specifying a variable and using the resulting 
> $REPLY to preserve whitespace but the counts still didn't match.
>
> In any case this points to larger chunks being more efficient.

Oh! That is a clever idea, I wanted to try reading in larger chunks, but
I wasn't sure how to ensure I had read whole words until you gave
this idea. Using 64K chunks I was able to shave off about 7s in my
testing:

declare -iA words_to_freq
eof='false'
set -o noglob
while [[ "${eof}" == 'false' ]]; do
  if ! LANG='C' IFS='' read -N 65536 -r block; then
    eof='true'
  fi
  if ! IFS='' read -r line; then
    eof='true'
  fi
  for word in ${block@L}${line@L}; do
    words_to_freq["${word}"]+=1
  done
done
set +o noglob



reply via email to

[Prev in Thread] Current Thread [Next in Thread]