Re: Counting words, fast!

help-bash

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Counting words, fast!

From:	Dennis Williamson
Subject:	Re: Counting words, fast!
Date:	Wed, 17 Mar 2021 10:50:30 -0500

On Wed, Mar 17, 2021, 10:34 AM Jesse Hathaway <jesse@mbuki-mvuki.org> wrote:

> On Tue, Mar 16, 2021 at 10:30 PM Dennis Williamson
> <dennistwilliamson@gmail.com> wrote:
> > I've been playing with your optimized code changing the read to grab
> data in chunks like some of the other optimized code does - thus extending
> your move from by-word to by-line reading to reading a specified larger
> number of characters.
> >
> > IFS= read -r -N 4096 var
> >
> > And appending the result of a regular read to end at a newline. This
> seemed to cut about 20% off the time. But I get different counts than your
> code. I've tried using read without specifying a variable and using the
> resulting $REPLY to preserve whitespace but the counts still didn't match.
> >
> > In any case this points to larger chunks being more efficient.
>
> Oh! That is a clever idea, I wanted to try reading in larger chunks, but
> I wasn't sure how to ensure I had read whole words until you gave
> this idea. Using 64K chunks I was able to shave off about 7s in my
> testing:
>
> declare -iA words_to_freq
> eof='false'
> set -o noglob
> while [[ "${eof}" == 'false' ]]; do
>   if ! LANG='C' IFS='' read -N 65536 -r block; then
>     eof='true'
>   fi
>   if ! IFS='' read -r line; then
>     eof='true'
>   fi
>   for word in ${block@L}${line@L}; do
>     words_to_freq["${word}"]+=1
>   done
> done
> set +o noglob
>

Did you try smaller blocks? I didn't see any difference above 4K. Did you
verify that the counts are correct? Your code is a little different than
mine and may fix the count issue I was having.

>

[Prev in Thread]

Current Thread

[Next in Thread]

Counting words, fast!, Jesse Hathaway, 2021/03/16
- Re: Counting words, fast!, Leonid Isaev (ifax), 2021/03/16
  - Re: Counting words, fast!, Greg Wooledge, 2021/03/16
    - Re: Counting words, fast!, Leonid Isaev (ifax), 2021/03/16
  - Re: Counting words, fast!, Jesse Hathaway, 2021/03/16
    - Re: Counting words, fast!, Dennis Williamson, 2021/03/16
    - Re: Counting words, fast!, Jesse Hathaway, 2021/03/17
    - Re: Counting words, fast!, Dennis Williamson <=
    - Re: Counting words, fast!, Jesse Hathaway, 2021/03/17
    - Re: Counting words, fast!, Greg Wooledge, 2021/03/17
    - Re: Counting words, fast!, Jesse Hathaway, 2021/03/17
- Re: Counting words, fast!, Koichi Murase, 2021/03/19
  - Re: Counting words, fast!, Dennis Williamson, 2021/03/19
  - Re: Counting words, fast!, Jesse Hathaway, 2021/03/19
    - Re: Counting words, fast!, Koichi Murase, 2021/03/19
    - Re: Counting words, fast!, Koichi Murase, 2021/03/19
    - Re: Counting words, fast!, Lawrence Velázquez, 2021/03/20
    - Re: Counting words, fast!, Jesse Hathaway, 2021/03/22

Prev by Date: Re: Changing the way bash expands associative array subscripts
Next by Date: Re: Counting words, fast!
Previous by thread: Re: Counting words, fast!
Next by thread: Re: Counting words, fast!
Index(es):
- Date
- Thread