[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: repeated extended pattern substitution incredibly slow w/large varia
From: |
Piotr Grzybowski |
Subject: |
Re: repeated extended pattern substitution incredibly slow w/large variables |
Date: |
Sun, 18 Sep 2016 17:21:33 +0200 |
Hi,
maybe I do not fully follow your example, but wouldn't you instead of:
time D="${C//\[+([0-9])\]=}" # rm '[<subscr>]='
want:
time D="${C//\[[0-9]*\]=}" # rm '[<subscr>]='
your example copies a lot to D and thats what takes time, I guess.
cheers,
pg
On 18 Sep 2016, at 11:32, xaoxx@t-online.de wrote:
>
> Configuration Information [Automatically generated, do not change]:
> Machine: i686
> OS: linux-gnu
> Compiler: gcc
> Compilation CFLAGS: -DPROGRAM='bash' -DCONF_HOSTTYPE='i686'
> -DCONF_OSTYPE='linux-gnu' -DCONF_MACHTYPE='i686-pc-linux-gnu'
> -DCONF_VENDOR='pc' -DLOCALEDIR='/usr/share/locale' -DPACKAGE='bash' -DSHELL
> -DHAVE_CONFIG_H -DDEBUG -DMALLOC_DEBUG -I. -I. -I./include -I./lib -g -O2
> -Wno-parentheses -Wno-format-security
> uname output: Linux Xaox 4.4.0-tm3 #2 Mon Feb 22 13:26:44 CET 2016 i686
> GNU/Linux
> Machine Type: i686-pc-linux-gnu
>
> Bash Version: 4.4
> Patch Level: 0
> Release Status: rc2 / release
>
> Description:
> The tests below were performed with 4.4.0-rc2. However, the problem is
> still present in 4.4.0-release, only execution times are even higher
> for about 20%.
>
> Repeated pattern substitution (here: removal) using an extended pattern
> and variables of considerable size is incredibly time and cpu consuming.
> The command that revealed the problem was:
>
> D=${C//\[+([0-9])\]=}
>
> The variable C contains the output of 'declare -p A', where A is an
> array with 510 file names and C contains 510 matches. But as can be
> seen below, also commands like
>
> D=${C//u+([a-z])} or D=${C//@(usr)}
>
> trigger the problem, but _not_ commands like
>
> D=${C//usr} or D=${C//u[a-z][a-z]}
>
> See the test case and statistics below.
>
> Of course, the problem is simply solvable be a mini sed(1) script, but
> every now and then I try comands like the above, because I think that
> simple tasks should be doable without the aid of external programmes.
> But in many such cases I must sadly accept that using external programs,
> especially sed(1), is the quicker method.
> Additionally I will have to revise my script (a ~100kb font editor)
> and possibly replace other constructs using extended pattern maching.
>
> Repeat-By:
> -----------------------------------------------------------------------
> declare -a B A=( /usr/share/consolefonts/* ) # column 2: here 510 files
>
> # A=( "${A[@]##*/}" ) # column 3: pure filenames
> # A=( "${A[@]/*/a}" ) # column 4: "a"
> # A=( "${A[@]/*}" ) # column 5: "" (empty)
>
> for matches in {10..500..10}; do
> B=( "${A[@]:0:matches}" ) # reduce array
> C=`declare -p B | sed -r "s/^[^=]+=?//"` # rm 'declare -<attr>
> <name>='
> time D="${C//\[+([0-9])\]=}" # rm '[<subscr>]='
> done
> ------------------------------------------------------------------------
>
> results (all with >99% cpu):
>
> number of | contents of array elements
> matches | size=${#C} path/file | file | "a" | empty
> ---------------------------------------------------------------
> 10: | 369 bytes 0.099s | 0.014s | 0.007s | 0.005s
> 20: | 900 1.261s | 0.315s | 0.048s | 0.036s
> 30: | 1453 5.274s | 1.538s | 0.168s | 0.134s
> 40: | 2070 15.030s | 4.868s | 0.406s | 0.324s
> 50: | 2655 31.830s | 10.694s | 0.814s | 0.644s
> 60: | 3240 56.831s | 19.203s | 1.423s | 1.130s
> 70: | 3837 94.022s | 32.356s | 2.299s | 1.829s
> 80: | 4384 139.000s | 47.079s | 3.473s | 2.751s
> 90: | 4998 204.683s | | 4.955s | 3.932s
> 100: | 5567 283.118s | | 6.871s | 5.452s
> 110: | 6135 | | 9.495s | 7.547s
> 120: | 6664 | | | 10.164s
> 200: | 15554 | | | 55.529s
>
> I was too impatient to wait for the complete array with 510
> elements to complete.
>
> The following test results all belong in column 1 + 2.
>
> the command: time D=`sed -r "s/\[[0-9]+\]=//g"<<<"$C"`
>
> 510: | 27137 bytes, R:0.020 U:0.007 S:0.007 67.66% ok!
>
>
> other commands:
>
> size=${#C} D=${C//usr} D=${C//u[a-z][a-z]}
> --------------------------------------------------------
> 100: 5567 bytes 0.004s 0.004s ok!
> 200: 11167 0.012s 0.012s
> 300: 16712 0.024s 0.024s
> 400: 21818 0.038s 0.040s
> 500: 26647 0.056s 0.057s
>
>
> but: D=${C//u+([a-z])} D=${C//@(usr)}
>
> 10: 0.136s 0.112s >99% cpu
> 20: 1.647s 1.078s
> 30: 6.467s 4.014s
> 40: 17.912s 10.886s
> 50: 38.178s 22.391s
>
> which seems to indicate that extended pattern matching causes the
> problem.
>
> Please CC answers to me as I am not subscribed to the list.
>
>
>
>
> ----------------------------------------------------------------
> Gesendet mit Telekom Mail <https://t-online.de/email-kostenlos> - kostenlos
> und sicher für alle!
>