[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

R: R: R: About hardfloat in ppc

From: Dino Papararo
Subject: R: R: R: About hardfloat in ppc
Date: Thu, 30 Apr 2020 16:34:55 +0000

Maybe the fastest way to implement hardfloats for ppc could be run them by 
default and until some fpu instruction request for FPSCR register.
At this time probably we want to check for some exception.. so QEMU could come 
back to last fpu instruction executed and re-execute it in softfloat taking 
care this time of FPSCR flags, then continue in hardfloats unitl another 
instruction looking for FPSCR register and so on..


-----Messaggio originale-----
Da: BALATON Zoltan <address@hidden> 
Inviato: giovedì 30 aprile 2020 17:36
A: 罗勇刚(Yonggang Luo) <address@hidden>
Cc: Richard Henderson <address@hidden>; Dino Papararo <address@hidden>; 
address@hidden; Programmingkid <address@hidden>; address@hidden; Howard 
Spoelstra <address@hidden>; Alex Bennée <address@hidden>
Oggetto: Re: R: R: About hardfloat in ppc

On Thu, 30 Apr 2020, 罗勇刚(Yonggang Luo) wrote:
> I propose a new way to computing the float flags, We preserve a  float 
> computing cash typedef struct FpRecord {  uint8_t op;
>  float32 A;
>  float32 B;
> }  FpRecord;
> FpRecord fp_cache[1024];
> int fp_cache_length;
> uint32_t fp_exceptions;
> 1. For each new fp operation we push it to the  fp_cache, 2. Once we 
> read the fp_exceptions , then we re-compute the fp_exceptions by 
> re-running the fp FpRecord sequence.
> and clear  fp_cache_length.
> 3. If we clear the fp_exceptions , then we set fp_cache_length to 0 
> and clear  fp_exceptions.
> 4. If the  fp_cache are full, then we re-compute the fp_exceptions by 
> re-running the fp FpRecord sequence.
> Would this be a general method to use hard-float?
> The consued time should be  2*hard_float.
> Considerating read fp_exceptions are rare, then the amortized time 
> complexity would be 1 * hard_float.

It's hard to guess what the hit rate of such cache would be and if it's low 
then managing the cache is probably more expensive than running with softfloat. 
So to evaluate any proposed patch we also need some benchmarks which we can 
experiment with to tell if the results are good or not otherwise we're just 
guessing. Are there some existing tests and benchmarks that we can use? Alex 
mentioned fp-bench I think and to evaluate the correctness of the FP 
implementation I've seen this other


Is that something we can use for PPC as well to check the correctness?

So I think before implementing any potential solution that came up in this 
brainstorming the first step would be to get and compile (or write if not
available) some tests and benchmarks:

1. testing host behaviour for inexact and compare that for different archs 2. 
some FP tests that can be used to compare results with QEMU and real CPU to 
check correctness of emulation (if these check for inexact differences then 
could be used instead of 1.) 3. some benchmarks to evaluate QEMU performance 
(these could be same as FP tests or some real world FP heavy applications).

Then we can see if the proposed solution is faster and still correct.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]