qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC PATCH v2] target/ppc: Enable hardfloat for PPC


From: BALATON Zoltan
Subject: Re: [RFC PATCH v2] target/ppc: Enable hardfloat for PPC
Date: Tue, 3 Mar 2020 00:16:37 +0100 (CET)
User-agent: Alpine 2.22 (BSF 395 2020-01-19)

On Mon, 2 Mar 2020, Richard Henderson wrote:
On 3/2/20 3:42 AM, BALATON Zoltan wrote:
The "hardfloat" option works (with other targets) only with ieee745
accumulative exceptions, when the most common of those exceptions, inexact, has
already been raised.  And thus need not be raised a second time.

Why exactly it's done that way? What are the differences between IEEE FP
implementations that prevents using hardfloat most of the time instead of only
using it in some (although supposedly common) special cases?

While it is possible to read the host's ieee exception word after the hardfloat
operation, there are two reasons that is undesirable:

(1) It is *slow*.  So slow that it's faster to run the softfloat code instead.
I thought it would be easier to find the benchmark numbers that Emilio
generated when this was tested, but I can't find it.

I remember those benchmarks too and this is also what the paper Alex referred to also confirmed. Also I've found that enabling hardfloat for PPC without doing anything else is slightly slower (on a recent CPU, on older CPUs could be even slower). Interetingly however it does give a speedup for vector instructions (maybe because they don't clear flags between each sub operation). Does that mean these vector instruction helpers are also buggy regarding exceptions?

(2) IEEE has a number of implementation choices for corner cases, and we need
to implement the target's choices, not the host's choices.

But how is that related to inexact flag and float_round_nearest_even rounding mode which are the only two things can_use_fpu() function checks for?

I think CPUs can also raise exceptions when they detect the condition in
hardware so maybe we should install our FPU exception handler and set guest
flags from that then we don't need to check and won't have problem with these
bits either. Why is that not possible or isn't done?

If we have to enable and disable host fpu exceptions going in and out of
softfloat routines, we're back to modifying the host fpu control word, which as
described above, is *slow*.

That handler could only
set a global flag on each exception that targets can be checked by targets and
handle differences. This global flag then can include non-sticky versions if
needed because clearing a global should be less expensive than clearing FPU
status reg. But I don't really know, just guessing, somone who knows more about
FPUs probably knows a better way.

I don't know if anyone has tried that variant, where we simply leave the
exceptions enabled, leave the signal handler enabled, and use a global.

Feel free to try it and benchmark it.

I probably won't try any time soon. I have several other half finished stuff to hack on to not take up another one I likely can't finish, but hope this discussion inspires someone to try it. I'm also interested in the results. If nobody tries in the next two years maybe I get there eventually.

Regards,
BALATON Zoltan

reply via email to

[Prev in Thread] Current Thread [Next in Thread]