[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Lightning] About jit_roundr_d_i

From: Paulo César Pereira de Andrade
Subject: Re: [Lightning] About jit_roundr_d_i
Date: Fri, 10 Sep 2010 09:53:56 -0300

Em 10 de setembro de 2010 04:48, Paolo Bonzini <address@hidden> escreveu:
> On 09/10/2010 12:21 AM, Paulo César Pereira de Andrade wrote:
>>   The default implementation, assuming following the
>> "round" definition and default (round to nearest) rounding mode,
>> is actually wrong because of ties. The problem is that on
>> ties, "round" should round away from zero, but there is
>> no such rounding mode, only the default, towards zero.
> True.  Let's just document that round is the same as the C function rint.
>  (Though, shouldn't it round to even?)

  Yes, the round to nearest mode is round to even on ties. I am making
a more complete test case covering this...

>>   BTW, is this really correct?
>> extr_d_f  o1 o2<- convert float o2 to double o1
>> extr_i_d  o1 o2<- convert int o2 to double o1
>> If extr_i_d means "int to double", then extr_d_f
>> should mean "double to float", and not "float to double"
> Which it does:
> #define jit_extr_d_f(rd, rs)    CVTSD2SSrr((rs), (rd))
> SD2SS = scalar double to scalar float.
> But for x87 they're all dummy.

  On x86_64 I need to call jit_extr_d_f(JIT_FPR(x), JIT_FPR(y))
to pass it as printf argument, if it is a float, otherwise, nothing
is required.

>> (there is only FISTTP, no FISTT, so, need to
>> load the value, and pop it, instead of using
>> FXCH, but it should still be cheap as it does,
>> correctly, rounding towards zero on ties)
> FISTTP is not on all processors though.  At this point, it's easier to use
> SSE(2) for 32-bit too.

  I started to write some more code to handle x87. I did some
measures with this test tool script:
.code   256
        prolog 0
        movi_d %f0 -0.5
        movi_i %r0 10000000
        ceilr_d_i %r1 %f0
        subi_i %r0 %r0 1
        bgei_i loop %r0 0

replacing ceil by the conversion being tested. My computer
is below average, and, still, I noticed timings like this:

-  0.05-0.09 a new implementation that also uses FCOMI (p6 or newer)
   that does the dirty job of loading the fpu status word and setting
   zero flag, overflow flag or parity flag.
-  0.09-0.12 the current code converted to an inline function as only
-  0.25-0.28 a fully safe version, but very costly, that loads the control
   word, adjusts the desired rounding mode, and reverts it at exit, but
   still, should be a lot faster than a function call to libm's version.

  I still did not push the change because I am still working on a
proper jit_roundr_x_y, and I aliased, at least for now, and only for
me, a new jit_rintr_x_y, to the previous jit_roundr_x_y.

cut&paste of a small chunk, to have an idea, and also start of logic
to have runtime configurability:

__jit_inline void
_i386_floorr_d_i(jit_gpr_t r0, int f0)
    jit_gpr_t   aux = r0 == _RDX ? _RAX : _RDX;

    SUBLir(8, _RSP);
    FISTLm(0, _RSP, 0, 0);      /* *esp = (int)*st */
    FILDLm(0, _RSP, 0, 0);      /* *--st = (float)(int)*esp */
    FSUBRPr(1);                 /* st[1] = st[1] - *st, ++st */
    FSTPSm(4, _RSP, 0, 0);      /* *(esp + 4) = (int)*st, ++st */
    POPLr(r0);                  /* r0 = rint(f0) */
    POPLr(aux);                 /* aux = f0 - round(f0) */
    ADDLir(0x7FFFFFFF, aux);    /* carry if aux < 0 */
    SBBLir(0, r0);              /* subtract 1 if carry */

__jit_inline void
_i686_floorr_d_i(jit_gpr_t r0, int f0)
    /* make room for conversion */
    /* push value to x87 stack */
    /* round st(0) to nearest integer */
    /* compare st(0) to st(f0 + 1) and set eflags */
    FCOMIr(f0 + 1);
    /* store converted integer to stack and pop x87 one */
    FISTPLm(0, _RSP, 0, 0);
    /* pop converted integer */
    /* subtract 1 if carry */
    SBBLir(0, r0);

#define jit_floorr_d_i(r0, f0)          jit_floorr_d_i(r0, f0)
__jit_inline void
jit_floorr_d_i(jit_gpr_t r0, int f0)
    if (jit_always_round_to_nearest()) {
        if (jit_i686())
            _i686_floorr_d_i(r0, f0);
            _i386_floorr_d_i(r0, f0);
        _safe_floor_d_i(r0, f0);

_safe_floor_d_i is like the size of the entire chunk above,
not cut&pasting to save space :-)

> Paolo


reply via email to

[Prev in Thread] Current Thread [Next in Thread]