lightning
[Top][All Lists]

 From: Paulo César Pereira de Andrade Subject: Re: [Lightning] About jit_roundr_d_i Date: Fri, 10 Sep 2010 09:53:56 -0300

```Em 10 de setembro de 2010 04:48, Paolo Bonzini <address@hidden> escreveu:
> On 09/10/2010 12:21 AM, Paulo César Pereira de Andrade wrote:
>>
>>   The default implementation, assuming following the
>> "round" definition and default (round to nearest) rounding mode,
>> is actually wrong because of ties. The problem is that on
>> ties, "round" should round away from zero, but there is
>> no such rounding mode, only the default, towards zero.
>
> True.  Let's just document that round is the same as the C function rint.
>  (Though, shouldn't it round to even?)

Yes, the round to nearest mode is round to even on ties. I am making
a more complete test case covering this...

>>   BTW, is this really correct?
>> extr_d_f  o1 o2<- convert float o2 to double o1
>> extr_i_d  o1 o2<- convert int o2 to double o1
>>
>> If extr_i_d means "int to double", then extr_d_f
>> should mean "double to float", and not "float to double"
>
> Which it does:
>
> #define jit_extr_d_f(rd, rs)    CVTSD2SSrr((rs), (rd))
>
> SD2SS = scalar double to scalar float.
>
> But for x87 they're all dummy.

On x86_64 I need to call jit_extr_d_f(JIT_FPR(x), JIT_FPR(y))
to pass it as printf argument, if it is a float, otherwise, nothing
is required.

>> (there is only FISTTP, no FISTT, so, need to
>> FXCH, but it should still be cheap as it does,
>> correctly, rounding towards zero on ties)
>
> FISTTP is not on all processors though.  At this point, it's easier to use
> SSE(2) for 32-bit too.

I started to write some more code to handle x87. I did some
measures with this test tool script:
-%<-
.code   256
prolog 0
movi_d %f0 -0.5
movi_i %r0 10000000
loop:
ceilr_d_i %r1 %f0
subi_i %r0 %r0 1
bgei_i loop %r0 0
ret
-%<-

replacing ceil by the conversion being tested. My computer
is below average, and, still, I noticed timings like this:

-  0.05-0.09 a new implementation that also uses FCOMI (p6 or newer)
that does the dirty job of loading the fpu status word and setting
zero flag, overflow flag or parity flag.
-  0.09-0.12 the current code converted to an inline function as only
change.
-  0.25-0.28 a fully safe version, but very costly, that loads the control
word, adjusts the desired rounding mode, and reverts it at exit, but
still, should be a lot faster than a function call to libm's version.

I still did not push the change because I am still working on a
proper jit_roundr_x_y, and I aliased, at least for now, and only for
me, a new jit_rintr_x_y, to the previous jit_roundr_x_y.

cut&paste of a small chunk, to have an idea, and also start of logic
to have runtime configurability:

-%<-
__jit_inline void
_i386_floorr_d_i(jit_gpr_t r0, int f0)
{
jit_gpr_t   aux = r0 == _RDX ? _RAX : _RDX;

PUSHLr(aux);
FLDr(f0);
SUBLir(8, _RSP);
FISTLm(0, _RSP, 0, 0);      /* *esp = (int)*st */
FILDLm(0, _RSP, 0, 0);      /* *--st = (float)(int)*esp */
FSUBRPr(1);                 /* st = st - *st, ++st */
FSTPSm(4, _RSP, 0, 0);      /* *(esp + 4) = (int)*st, ++st */
POPLr(r0);                  /* r0 = rint(f0) */
POPLr(aux);                 /* aux = f0 - round(f0) */
ADDLir(0x7FFFFFFF, aux);    /* carry if aux < 0 */
SBBLir(0, r0);              /* subtract 1 if carry */
POPLr(aux);
}

__jit_inline void
_i686_floorr_d_i(jit_gpr_t r0, int f0)
{
/* make room for conversion */
PUSHLr(_RAX);
/* push value to x87 stack */
FLDr(f0);
/* round st(0) to nearest integer */
FRNDINT_();
/* compare st(0) to st(f0 + 1) and set eflags */
FCOMIr(f0 + 1);
/* store converted integer to stack and pop x87 one */
FISTPLm(0, _RSP, 0, 0);
/* pop converted integer */
POPLr(r0);
/* subtract 1 if carry */
SBBLir(0, r0);
}

#define jit_floorr_d_i(r0, f0)          jit_floorr_d_i(r0, f0)
__jit_inline void
jit_floorr_d_i(jit_gpr_t r0, int f0)
{
if (jit_always_round_to_nearest()) {
if (jit_i686())
_i686_floorr_d_i(r0, f0);
else
_i386_floorr_d_i(r0, f0);
}
else
_safe_floor_d_i(r0, f0);
}
-%<-

_safe_floor_d_i is like the size of the entire chunk above,
not cut&pasting to save space :-)

> Paolo

Paulo

```