|
From: | Jakub Martisko |
Subject: | Re: Performance Regression between sed 4.4 and sed 4.5+ |
Date: | Mon, 29 Jul 2024 13:30:39 +0200 |
The callgrind output files for the locales/versions mentioned in the previous email On Mon, Jul 29, 2024 at 1:23 PM Jakub Martisko <jamartis@redhat.com> wrote: > Hello, > > This still seems to be an issue as of sed-4.9. I've tried some tests using > the valgrind/callgrind+kcachegrind and using various locales (I'll try to > attach to a next email the callgrind profile files but I am not sure > whether they'll pass attachment size/spam filters). > > I've tested with sed 4.4 and 4.5. Sed 4.9 behaves the same as 4.5, but > since the regression first appeared in 4.5 I'm using that. I've used the > following two locales (I've always set both LANG and LC_ALL to the locale): > C and en_US.iso88591 (calling this just as US). The program was the one > from my first email, the input file was the head -n 100 input also > mentioned in the first mail. > > Interesting observations: > > 1) In sed 4.4 there seems to be a difference between the C and US locales: > (numbers in parenthesis correspond to the numbers of call reported by the > kcachegrind): > US: ...-> dfaexec_sb (100) -> dfaexec_main (100) -> build_state (449) -> > insert (~1.9 mil) > C: ...-> dfaexec_sb (100) -> dfaexec_main (100) -> build_state (~135 000) > -> insert (~135 000) > > 2) The C locale seems to behave the same way both in 4.4 and 4.5 > > 3)The US locale behaves differently in 4.4 and 4.5: > 4.4: ...-> dfaexec_sb (100) -> dfaexec_main (100) -> build_state (449) -> > insert (~1.9 mil) > 4.5: ...-> dfaexec_sb (100) -> dfaexec_main (100) -> build_state (~130 > 000) -> insert (~220 mil) > with the following run times (time sed...): > > 4.4: > real 0m0.042s > user 0m0.035s > sys 0m0.007s > > 4.5: > real 0m1.289s > user 0m1.248s > sys 0m0.037s > > On Thu, Nov 23, 2023 at 11:27 AM Jakub Martisko <jamartis@redhat.com> > wrote: > >> Sending some statistics, I've tried to use LANG=C which helped a bit, >> but the performance is still worse than when using sed 4.4 >> >> These are with LANG=C.UTF-8 >> time ~/repos/Fedora/sed/sed-4.4/sed/sed -nf program.sed input > output_4.4 >> >> real 0m21.214s >> user 0m20.641s >> sys 0m0.510s >> >> >> time ~/repos/Fedora/sed/sed-4.5/sed/sed -nf program.sed input > output_4.4 >> >> real 183m34.784s >> user 179m23.097s >> sys 3m51.128s >> >> These with LANG=C >> time LANG=C ~/repos/Fedora/sed/sed-4.4/sed/sed -nf program.sed input > >> output_4.4 >> >> real 11m16.226s >> user 8m39.261s >> sys 2m34.840s >> >> >> time LANG=C ~/repos/Fedora/sed/sed-4.5/sed/sed -nf program.sed input > >> output_4.4 >> >> real 9m17.259s >> user 7m11.610s >> sys 2m3.557s >> >> On Wed, Nov 22, 2023 at 11:28 AM Jakub Martisko <jamartis@redhat.com> >> wrote: >> > >> > Hello, >> > >> > there seems to be large performance regression starting in sed 4.5. >> > I've done most of the testing on sed 4.4 and 4.5 since this seems to >> > be the point where it was introduced, but it is present in 4.9 too. >> > >> > Command I'm using: >> > >> > sed -nf program.sed input > /tmp/out >> > >> > program.sed should be in the attachment, the input file is 355MB >> > large, so I am attaching only a head -n100 of the input file (can >> > share the rest somehow if needed). When running with sed 4.4 the run >> > ends in roughly a minute (with the full input file), however with the >> > newer versions, sed runs for several hours (~4h). Both versions were >> > built/run on the same machine. I am also attaching a gprof profiles of >> > the 4.4 and 4.5 runs. >> > >> > Thanks, >> > Jakub >> >
4.4_LANG_C
Description: Binary data
4.5_LANG_C
Description: Binary data
4.4_LANG_us
Description: Binary data
4.5_LANG_us
Description: Binary data
[Prev in Thread] | Current Thread | [Next in Thread] |