[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [qemu-s390x] [Qemu-devel] [PATCH] include/fpu/softfloat: Fix compila

From: Thomas Huth
Subject: Re: [qemu-s390x] [Qemu-devel] [PATCH] include/fpu/softfloat: Fix compilation with Clang on s390x
Date: Wed, 16 Jan 2019 07:33:28 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1

On 2019-01-15 21:05, Emilio G. Cota wrote:
> On Tue, Jan 15, 2019 at 16:01:32 +0000, Alex Bennée wrote:
>> Ahh I should have mentioned we already have the technology for this ;-)
>> If you build the fpu/next tree on a s390x you can then run:
>>   ./tests/fp/fp-bench f64_div
>> with and without the CONFIG_128 path. To get an idea of the real world
>> impact you can compile a foreign binary and run it on a s390x system
>> with:
>>   $QEMU ./tests/fp/fp-bench f64_div -t host
>> And that will give you the peak performance assuming your program is
>> doing nothing but f64_div operations. If the two QEMU's are basically in
>> the same ballpark then it doesn't make enough difference. That said:
> I think you mean here `tests/fp/fp-bench -o div -p double', otherwise
> you'll get the default op (-o add).

I tried that now, too, and -o div -p double does not really seem to
exercise this function at all.

Here are my results (disclaimer: that system is likely not really usable
for benchmarks since it's CPUs are shared with other LPARs, but I ran
all the tests at least twice and got similar results):

With the DGLR inline assembly:

 time ./fp-test f64_div -l 2 -r all
 real   6m43,648s
 user   6m43,362s
 sys    0m0,160s

 time ./fp-bench -o div -p double
 204.98 MFlops
 real   0m1,002s
 user   0m1,001s
 sys    0m0,001s

With the "#else" default 64-bit code:

 time ./fp-test f64_div -l 2 -r all
 real   6m44,910s
 user   6m44,616s
 sys    0m0,165s

 time ./fp-bench -o div -p double
 205.41 MFlops
 real   0m1,002s
 user   0m1,001s
 sys    0m0,001s

With the new CONFIG_INT128 code:

 time ./fp-test f64_div -l 2 -r all
 real   6m58,371s
 user   6m58,078s
 sys    0m0,164s

 time ./fp-bench -o div -p double
 205.17 MFlops
 real   0m1,002s
 user   0m1,000s
 sys    0m0,001s

==> The new CONFIG_INT128 code is really worse than the 64-bit code, so
I don't think we should include this yet (unless we know a system where
the compiler can create optimized assembly code without libgcc here).


reply via email to

[Prev in Thread] Current Thread [Next in Thread]