freetype-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ft-devel] FT_MulFix assembly


From: James Cloos
Subject: Re: [ft-devel] FT_MulFix assembly
Date: Sat, 07 Aug 2010 12:36:27 -0400
User-agent: Gnus/5.110011 (No Gnus v0.11) Emacs/24.0.50 (gnu/linux)

My first cut at FT_MulFix_x86_64() is:

static __inline__ FT_Int32
FT_MulFix_x86_64 (FT_Int32 a, FT_Int32 b) {
    register FT_Int32 r;
    __asm__ __volatile__ (
        "movslq %%edx, %%rdx\n"
        "cltq\n"
        "imul  %%rdx\n"
        "addq  %%rdx, %%rax\n"
        "addq  $0x8000, %%rax\n"
        "sarq  $16, %%rax\n"
        : "=a"(r)
        : "a"(a), "d"(b));
    return r;
}

It passes a monte-carlo test comparing its results to the C code and to
the i386 assembly.

The logic is simple.  The first two instructions sign-extend the two
values to 64 bits, the multiply puts the least significant 64 bits of
the product in rax and the most significant bits in rdx; because the
values started out as 32 bit, rdx is guaranteed to be only sign bits:
zero if the product is >=0, else -1.  Adding the resulting rdx to rax
serves the same purpose as the ecx value in the i386 version: it makes
the rounding symmetric around zero, just like the C code.

An alternative might be to cast the src values to (FT_Int64), but I
doubt that the compiler would generate any better code than calling
movslq and cltq.  

I have to finish the patch, but I thought I'd offer the algorithm for
review, if anyone wants to.

-JimC
-- 
James Cloos <address@hidden>         OpenPGP: 1024D/ED7DAEA6



reply via email to

[Prev in Thread] Current Thread [Next in Thread]