bug-bash
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Combination of "eval set -- ..." and $() command substitution is slo


From: Chet Ramey
Subject: Re: Combination of "eval set -- ..." and $() command substitution is slow
Date: Fri, 12 Jul 2019 10:44:27 -0400
User-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:60.0) Gecko/20100101 Thunderbird/60.8.0

On 7/10/19 1:21 PM, astian wrote:

> Bash Version: 5.0
> Patch Level: 3
> Release Status: release
> 
> Description:
> 
>   I discovered a curious performance degradation in the combined usage of the
>   constructs "eval set -- ..." and new-style command substitution.  In short,
>   setting the positional arguments via eval and then iterating over each one
>   while performing $() command substitution(s) is significantly slower than
>   not using eval, or not making command substitution, or using `` instead.
> 
>   I include below a reduced test script that illustrates the issue.  A few
>   notes:
>     - The pathological case is "1 1 0".
>     - I did not observe performance difference in unoptimised builds (-O0).
> 

>     --------------------------
>     case 1 1 0
>     eval set
>     real    0m0.002s
>     user    0m0.000s
>     sys     0m0.000s
>     for loop cmdsubst-currency
>     real    0m0.968s
>     user    0m0.432s
>     sys     0m0.148s
>     --------------------------

> 
>   Observations:
>     - The pathological case "1 1 0" spends about 10 times more time doing
>       something in userspace during the loop, relative to the comparable cases
>       "0 1 0", "0 1 1", and "1 1 1".
>     - $() seems generally slightly slower than ``, but becomes pathologically
>       so when preceded with "eval set -- ...".

It is slightly slower -- POSIX requires that the shell parse the contents
of $(...) to determine that it's a valid script as part of finding the
closing `)'. The rules for finding the closing "`" don't have that
requirement.

>     - "eval set -- ..." itself doesn't seem slow at all, but obviously it has
>       side-effects not captured by the "time" measurement tool.

What happens is you end up with a 4900-character command string that you
have to parse multiple times. But that's not the worst of it.

The gprof output provides a clue.


>       case 1 1 0 (pathological):
>        %   cumulative   self              self     total
>       time   seconds   seconds    calls  us/call  us/call  name
>       38.89      0.21     0.21    28890     7.27     7.27  set_line_mbstate

set_line_mbstate() runs through each command line before parsing, creating
a bitmap that indicates whether each element is a single-byte character or
part of a multi-byte character. The scanner uses this to determine whether
a shell metacharacter should act as a delimiter or get skipped over as part
of a multibyte character. For a single run with args `1 1 0', it gets
called around 7300 times, with around 2400 of them for the 4900-character
string with all the arguments.

When you're in a multibyte locale (en_US.UTF-8 is one such), each one of
those characters requires a call to mbrlen/mbrtowc. So that ends up being
2400 * 4900 calls to mbrlen.

There is something happening here -- there's no way there should be that
many calls to set_line_mbstate(), even when you have to save and restore
the input line because you have to parse the contents of $(). There must
be some combination of the effect of `eval' on the line bitmap and the
long string. I'll see what I can figure out.

Chet

-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
                 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRU    chet@case.edu    http://tiswww.cwru.edu/~chet/



reply via email to

[Prev in Thread] Current Thread [Next in Thread]