bug-bash
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

repeated extended pattern substitution incredibly slow w/large variables


From: address@hidden
Subject: repeated extended pattern substitution incredibly slow w/large variables
Date: Sun, 18 Sep 2016 11:32:45 +0200 (MEST)

Configuration Information [Automatically generated, do not change]:
Machine: i686
OS: linux-gnu
Compiler: gcc
Compilation CFLAGS:  -DPROGRAM='bash' -DCONF_HOSTTYPE='i686' 
-DCONF_OSTYPE='linux-gnu' -DCONF_MACHTYPE='i686-pc-linux-gnu' 
-DCONF_VENDOR='pc' -DLOCALEDIR='/usr/share/locale' -DPACKAGE='bash' -DSHELL 
-DHAVE_CONFIG_H -DDEBUG -DMALLOC_DEBUG -I.  -I. -I./include -I./lib   -g -O2 
-Wno-parentheses -Wno-format-security
uname output: Linux Xaox 4.4.0-tm3 #2 Mon Feb 22 13:26:44 CET 2016 i686 
GNU/Linux
Machine Type: i686-pc-linux-gnu

Bash Version: 4.4
Patch Level: 0
Release Status: rc2 / release

Description:
        The tests below were performed with 4.4.0-rc2. However, the problem is
        still present in 4.4.0-release, only execution times are even higher
        for about 20%.

        Repeated pattern substitution (here: removal) using an extended pattern
        and variables of considerable size is incredibly time and cpu consuming.
        The command that revealed the problem was:

                 D=${C//\[+([0-9])\]=}

        The variable C contains the output of 'declare -p A', where A is an
        array with 510 file names and C contains 510 matches. But as can be
        seen below, also commands like

                D=${C//u+([a-z])}   or  D=${C//@(usr)}

        trigger the problem, but _not_ commands like

                D=${C//usr}         or  D=${C//u[a-z][a-z]}

        See the test case and statistics below.

        Of course, the problem is simply solvable be a mini sed(1) script, but
        every now and then I try comands like the above, because I think that
        simple tasks should be doable without the aid of external programmes.
        But in many such cases I must sadly accept that using external programs,
        especially sed(1), is the quicker method.
        Additionally I will have to revise my script (a ~100kb font editor)
        and possibly replace other constructs using extended pattern maching.

Repeat-By:
        -----------------------------------------------------------------------
        declare -a B A=( /usr/share/consolefonts/* ) # column 2: here 510 files

        # A=( "${A[@]##*/}" )                        # column 3: pure filenames
        # A=( "${A[@]/*/a}" )                        # column 4: "a"
        # A=( "${A[@]/*}" )                          # column 5: "" (empty)

        for matches in {10..500..10}; do
          B=( "${A[@]:0:matches}" )                # reduce array
          C=`declare -p B | sed -r "s/^[^=]+=?//"` # rm 'declare -<attr> 
<name>='
          time D="${C//\[+([0-9])\]=}"             # rm '[<subscr>]='
        done
        ------------------------------------------------------------------------

        results (all with >99% cpu):

        number of |  contents of array elements
        matches   |  size=${#C}  path/file |   file  |  "a"   |  empty
        ---------------------------------------------------------------
          10:     |   369 bytes   0.099s   |  0.014s | 0.007s |  0.005s
          20:     |   900         1.261s   |  0.315s | 0.048s |  0.036s
          30:     |  1453         5.274s   |  1.538s | 0.168s |  0.134s
          40:     |  2070        15.030s   |  4.868s | 0.406s |  0.324s
          50:     |  2655        31.830s   | 10.694s | 0.814s |  0.644s
          60:     |  3240        56.831s   | 19.203s | 1.423s |  1.130s
          70:     |  3837        94.022s   | 32.356s | 2.299s |  1.829s
          80:     |  4384       139.000s   | 47.079s | 3.473s |  2.751s
          90:     |  4998       204.683s   |         | 4.955s |  3.932s
         100:     |  5567       283.118s   |         | 6.871s |  5.452s
         110:     |  6135                  |         | 9.495s |  7.547s
         120:     |  6664                  |         |        | 10.164s
         200:     | 15554                  |         |        | 55.529s

        I was too impatient to wait for the complete array with 510
        elements to complete.

        The following test results all belong in column 1 + 2.

        the command:    time D=`sed -r "s/\[[0-9]+\]=//g"<<<"$C"`

         510:     | 27137 bytes,  R:0.020 U:0.007 S:0.007 67.66%   ok!


        other commands:

                size=${#C}   D=${C//usr}   D=${C//u[a-z][a-z]}
        --------------------------------------------------------
         100:    5567 bytes  0.004s             0.004s             ok!
         200:   11167        0.012s             0.012s
         300:   16712        0.024s             0.024s
         400:   21818        0.038s             0.040s
         500:   26647        0.056s             0.057s


        but:           D=${C//u+([a-z])}        D=${C//@(usr)}

          10:                0.136s             0.112s         >99% cpu
          20:                1.647s             1.078s
          30:                6.467s             4.014s
          40:               17.912s            10.886s
          50:               38.178s            22.391s

        which seems to indicate that extended pattern matching causes the
        problem.

        Please CC answers to me as I am not subscribed to the list.




----------------------------------------------------------------
Gesendet mit Telekom Mail <https://t-online.de/email-kostenlos> - kostenlos und 
sicher für alle!



reply via email to

[Prev in Thread] Current Thread [Next in Thread]