help-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: How to Generate a Long String of the Same Character


From: Neil R. Ormos
Subject: Re: How to Generate a Long String of the Same Character
Date: Wed, 21 Jul 2021 00:24:37 -0500 (CDT)

Wolfgang Laun wrote:

> We've been hunting a phantom. -- I have hacked 5.1.0, boldly setting
> gawk_mb_cur_max = 1 in main.c (instead of the configured value of 6).
> And now:
> $time ~/Downloads/gawk/gawk-5.1.0/gawk -f srepNeil.awk
> MB_CUR_MAX = 1
> 0:00:01.130
> $ time ~/Downloads/gawk/gawk-5.1.0/gawk -f srepRec.awk
> MB_CUR_MAX = 1
> 0:00:01.892

>>From a comment in main.c:  "[MB_CUR_MAX] is tested *a lot* in many
> speed-critical places in gawk." length() is one of these places.

I went in the other direction.

Before seeing your message, I took Andy's hint and
switched from LC_ALL=C to a multibyte locale.

But that inverts the relative advantage.  The
recursive algorithm is much faster than doubling
on the 1 GB string, and recursive takes the same
runtime as doubling on the shorter strings.

This effect of using a multibyte locale is
mystifying.

I do make one call to length() per invocation of
the function, but the strings being measured are
no more than 5 characters long.

I would have expected all the runtimes to increase
linearly, give or take, but maybe what counts is
the number of discrete string operations, wherein
for each operation the entire string has to be
scanned in multibyte fashion to find the end.
Could it be?  Ugh.

It's back to the TTY15 locale for me.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]