[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Performance Regression between sed 4.4 and sed 4.5+
From: |
Jakub Martisko |
Subject: |
Re: Performance Regression between sed 4.4 and sed 4.5+ |
Date: |
Mon, 29 Jul 2024 13:23:11 +0200 |
Hello,
This still seems to be an issue as of sed-4.9. I've tried some tests using
the valgrind/callgrind+kcachegrind and using various locales (I'll try to
attach to a next email the callgrind profile files but I am not sure
whether they'll pass attachment size/spam filters).
I've tested with sed 4.4 and 4.5. Sed 4.9 behaves the same as 4.5, but
since the regression first appeared in 4.5 I'm using that. I've used the
following two locales (I've always set both LANG and LC_ALL to the locale):
C and en_US.iso88591 (calling this just as US). The program was the one
from my first email, the input file was the head -n 100 input also
mentioned in the first mail.
Interesting observations:
1) In sed 4.4 there seems to be a difference between the C and US locales:
(numbers in parenthesis correspond to the numbers of call reported by the
kcachegrind):
US: ...-> dfaexec_sb (100) -> dfaexec_main (100) -> build_state (449) ->
insert (~1.9 mil)
C: ...-> dfaexec_sb (100) -> dfaexec_main (100) -> build_state (~135 000)
-> insert (~135 000)
2) The C locale seems to behave the same way both in 4.4 and 4.5
3)The US locale behaves differently in 4.4 and 4.5:
4.4: ...-> dfaexec_sb (100) -> dfaexec_main (100) -> build_state (449) ->
insert (~1.9 mil)
4.5: ...-> dfaexec_sb (100) -> dfaexec_main (100) -> build_state (~130 000)
-> insert (~220 mil)
with the following run times (time sed...):
4.4:
real 0m0.042s
user 0m0.035s
sys 0m0.007s
4.5:
real 0m1.289s
user 0m1.248s
sys 0m0.037s
On Thu, Nov 23, 2023 at 11:27 AM Jakub Martisko <jamartis@redhat.com> wrote:
> Sending some statistics, I've tried to use LANG=C which helped a bit,
> but the performance is still worse than when using sed 4.4
>
> These are with LANG=C.UTF-8
> time ~/repos/Fedora/sed/sed-4.4/sed/sed -nf program.sed input > output_4.4
>
> real 0m21.214s
> user 0m20.641s
> sys 0m0.510s
>
>
> time ~/repos/Fedora/sed/sed-4.5/sed/sed -nf program.sed input > output_4.4
>
> real 183m34.784s
> user 179m23.097s
> sys 3m51.128s
>
> These with LANG=C
> time LANG=C ~/repos/Fedora/sed/sed-4.4/sed/sed -nf program.sed input >
> output_4.4
>
> real 11m16.226s
> user 8m39.261s
> sys 2m34.840s
>
>
> time LANG=C ~/repos/Fedora/sed/sed-4.5/sed/sed -nf program.sed input >
> output_4.4
>
> real 9m17.259s
> user 7m11.610s
> sys 2m3.557s
>
> On Wed, Nov 22, 2023 at 11:28 AM Jakub Martisko <jamartis@redhat.com>
> wrote:
> >
> > Hello,
> >
> > there seems to be large performance regression starting in sed 4.5.
> > I've done most of the testing on sed 4.4 and 4.5 since this seems to
> > be the point where it was introduced, but it is present in 4.9 too.
> >
> > Command I'm using:
> >
> > sed -nf program.sed input > /tmp/out
> >
> > program.sed should be in the attachment, the input file is 355MB
> > large, so I am attaching only a head -n100 of the input file (can
> > share the rest somehow if needed). When running with sed 4.4 the run
> > ends in roughly a minute (with the full input file), however with the
> > newer versions, sed runs for several hours (~4h). Both versions were
> > built/run on the same machine. I am also attaching a gprof profiles of
> > the 4.4 and 4.5 runs.
> >
> > Thanks,
> > Jakub
>
- Re: Performance Regression between sed 4.4 and sed 4.5+,
Jakub Martisko <=