sed-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Performance Regression between sed 4.4 and sed 4.5+


From: Jakub Martisko
Subject: Re: Performance Regression between sed 4.4 and sed 4.5+
Date: Mon, 29 Jul 2024 13:30:39 +0200

The callgrind output files for the locales/versions mentioned in the
previous email

On Mon, Jul 29, 2024 at 1:23 PM Jakub Martisko <jamartis@redhat.com> wrote:

> Hello,
>
> This still seems to be an issue as of sed-4.9. I've tried some tests using
> the  valgrind/callgrind+kcachegrind and using various locales (I'll try to
> attach to a next email the callgrind profile files but I am not sure
> whether they'll pass attachment size/spam filters).
>
> I've tested with sed 4.4 and 4.5. Sed 4.9 behaves the same as 4.5, but
> since the regression first appeared in 4.5 I'm using that. I've used the
> following two locales (I've always set both LANG and LC_ALL to the locale):
> C and en_US.iso88591 (calling this just as US). The program was the one
> from my first email, the input file was the head -n 100 input also
> mentioned in the first mail.
>
> Interesting observations:
>
> 1) In sed 4.4 there seems to be a difference between the C and US locales:
> (numbers in parenthesis correspond to the numbers of call reported by the
> kcachegrind):
> US: ...-> dfaexec_sb (100) -> dfaexec_main (100) -> build_state (449) ->
> insert (~1.9 mil)
> C: ...-> dfaexec_sb (100) -> dfaexec_main (100) -> build_state (~135 000)
> -> insert (~135 000)
>
> 2) The C locale seems to behave the same way both in 4.4 and 4.5
>
> 3)The US locale behaves differently in 4.4 and 4.5:
> 4.4: ...-> dfaexec_sb (100) -> dfaexec_main (100) -> build_state (449) ->
> insert (~1.9 mil)
> 4.5: ...-> dfaexec_sb (100) -> dfaexec_main (100) -> build_state (~130
> 000) -> insert (~220 mil)
> with the following run times (time sed...):
>
> 4.4:
> real 0m0.042s
> user 0m0.035s
> sys 0m0.007s
>
> 4.5:
> real 0m1.289s
> user 0m1.248s
> sys 0m0.037s
>
> On Thu, Nov 23, 2023 at 11:27 AM Jakub Martisko <jamartis@redhat.com>
> wrote:
>
>> Sending some statistics, I've tried to use LANG=C which helped a bit,
>> but the performance is still worse than when using sed 4.4
>>
>> These are with LANG=C.UTF-8
>> time ~/repos/Fedora/sed/sed-4.4/sed/sed -nf program.sed input > output_4.4
>>
>> real    0m21.214s
>> user    0m20.641s
>> sys    0m0.510s
>>
>>
>> time ~/repos/Fedora/sed/sed-4.5/sed/sed -nf program.sed input > output_4.4
>>
>> real    183m34.784s
>> user    179m23.097s
>> sys    3m51.128s
>>
>> These with LANG=C
>> time LANG=C ~/repos/Fedora/sed/sed-4.4/sed/sed -nf program.sed input >
>> output_4.4
>>
>> real    11m16.226s
>> user    8m39.261s
>> sys    2m34.840s
>>
>>
>> time LANG=C ~/repos/Fedora/sed/sed-4.5/sed/sed -nf program.sed input >
>> output_4.4
>>
>> real    9m17.259s
>> user    7m11.610s
>> sys    2m3.557s
>>
>> On Wed, Nov 22, 2023 at 11:28 AM Jakub Martisko <jamartis@redhat.com>
>> wrote:
>> >
>> > Hello,
>> >
>> > there seems to be large performance regression starting in sed 4.5.
>> > I've done most of the testing on sed 4.4 and 4.5 since this seems to
>> > be the point where it was introduced, but it is present in 4.9 too.
>> >
>> > Command I'm using:
>> >
>> >  sed -nf program.sed input > /tmp/out
>> >
>> > program.sed should be in the attachment, the input file is 355MB
>> > large, so I am attaching only a head -n100 of the input file (can
>> > share the rest somehow if needed). When running with sed 4.4 the run
>> > ends in roughly a minute (with the full input file), however with the
>> > newer versions, sed runs for several hours (~4h). Both versions were
>> > built/run on the same machine. I am also attaching a gprof profiles of
>> > the 4.4 and 4.5 runs.
>> >
>> > Thanks,
>> > Jakub
>>
>

Attachment: 4.4_LANG_C
Description: Binary data

Attachment: 4.5_LANG_C
Description: Binary data

Attachment: 4.4_LANG_us
Description: Binary data

Attachment: 4.5_LANG_us
Description: Binary data


reply via email to

[Prev in Thread] Current Thread [Next in Thread]