[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: About hardfloat in ppc

From: Programmingkid
Subject: Re: About hardfloat in ppc
Date: Thu, 30 Apr 2020 21:59:27 -0400

> On Apr 30, 2020, at 12:34 PM, Dino Papararo <address@hidden> wrote:
> Maybe the fastest way to implement hardfloats for ppc could be run them by 
> default and until some fpu instruction request for FPSCR register.
> At this time probably we want to check for some exception.. so QEMU could 
> come back to last fpu instruction executed and re-execute it in softfloat 
> taking care this time of FPSCR flags, then continue in hardfloats unitl 
> another instruction looking for FPSCR register and so on..
> Dino

That sounds like a good idea.

> -----Messaggio originale-----
> Da: BALATON Zoltan <address@hidden> 
> Inviato: giovedì 30 aprile 2020 17:36
> A: 罗勇刚(Yonggang Luo) <address@hidden>
> Cc: Richard Henderson <address@hidden>; Dino Papararo <address@hidden>; 
> address@hidden; Programmingkid <address@hidden>; address@hidden; Howard 
> Spoelstra <address@hidden>; Alex Bennée <address@hidden>
> Oggetto: Re: R: R: About hardfloat in ppc
> On Thu, 30 Apr 2020, 罗勇刚(Yonggang Luo) wrote:
>> I propose a new way to computing the float flags, We preserve a  float 
>> computing cash typedef struct FpRecord {  uint8_t op;
>> float32 A;
>> float32 B;
>> }  FpRecord;
>> FpRecord fp_cache[1024];
>> int fp_cache_length;
>> uint32_t fp_exceptions;
>> 1. For each new fp operation we push it to the  fp_cache, 2. Once we 
>> read the fp_exceptions , then we re-compute the fp_exceptions by 
>> re-running the fp FpRecord sequence.
>> and clear  fp_cache_length.
>> 3. If we clear the fp_exceptions , then we set fp_cache_length to 0 
>> and clear  fp_exceptions.
>> 4. If the  fp_cache are full, then we re-compute the fp_exceptions by 
>> re-running the fp FpRecord sequence.
>> Would this be a general method to use hard-float?
>> The consued time should be  2*hard_float.
>> Considerating read fp_exceptions are rare, then the amortized time 
>> complexity would be 1 * hard_float.
> It's hard to guess what the hit rate of such cache would be and if it's low 
> then managing the cache is probably more expensive than running with 
> softfloat. So to evaluate any proposed patch we also need some benchmarks 
> which we can experiment with to tell if the results are good or not otherwise 
> we're just guessing. Are there some existing tests and benchmarks that we can 
> use? Alex mentioned fp-bench I think and to evaluate the correctness of the 
> FP implementation I've seen this other
> conversation:
> https://lists.nongnu.org/archive/html/qemu-devel/2020-04/msg05107.html
> https://lists.nongnu.org/archive/html/qemu-devel/2020-04/msg05126.html
> Is that something we can use for PPC as well to check the correctness?
> So I think before implementing any potential solution that came up in this 
> brainstorming the first step would be to get and compile (or write if not
> available) some tests and benchmarks:
> 1. testing host behaviour for inexact and compare that for different archs 2. 
> some FP tests that can be used to compare results with QEMU and real CPU to 
> check correctness of emulation (if these check for inexact differences then 
> could be used instead of 1.) 3. some benchmarks to evaluate QEMU performance 
> (these could be same as FP tests or some real world FP heavy applications).
> Then we can see if the proposed solution is faster and still correct.
> Regards,
> BALATON Zoltan

reply via email to

[Prev in Thread] Current Thread [Next in Thread]