freetype-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ft-devel] FT_MulFix assembly


From: James Cloos
Subject: Re: [ft-devel] FT_MulFix assembly
Date: Tue, 07 Sep 2010 15:07:09 -0400
User-agent: Gnus/5.110011 (No Gnus v0.11) Emacs/24.0.50 (gnu/linux)

>>>>> "MB" == Miles Bader <address@hidden> writes:

MB> Hm, are you sure that's not backwards?  When I tried the git C version[*],
MB> as well as your most recent FT_MulFix_x86_64, it returned 0xFFFF8506...


Odd.  Adding your algo to my test app, I get:

  7AFA8000, FFFFFFFF, FFFF8505, FFFF8505, FFFF8506
 #    a   ,     b   ,    FT   ,    JC   ,    MB

I see that I have one small error in the C code in my app.

FT has:

    c = (FT_Long)( ( (FT_Int64)a * b + 0x8000L ) >> 16 );

whereas I used:

    c = (int32_t)(((int64_t)a*b + 0x8000L) >> 16);

But changing the int32_t to long does not change the results.

Yours still is always +1 compared to the C, whenever the first arg
represents a positive value with fractional part == 1/2.

Oddly, though, gcc now refuses to compile my asm, even though it did do
so before, complaining that I cannot guess what arg size to use for the
imul....  Wierd.  (The existing executables prove that it used to.)
A simple way around that is to specify "D" and "S" as the contraints
for a and b.  (The rdi and rsi regesters are where the x86_64 abi puts
the first two args which are passed to a function.)

The disassembly of the final version is:

00000000004006c0 <mf>:
  4006c0:       48 89 f8                mov    %rdi,%rax
  4006c3:       48 f7 ee                imul   %rsi
  4006c6:       48 01 d0                add    %rdx,%rax
  4006c9:       48 05 00 80 00 00       add    $0x8000,%rax
  4006cf:       48 c1 f8 10             sar    $0x10,%rax
  4006d3:       c3                      retq   

And I get this disassembly of yours:

0000000000400840 <miles>:
  400840:       48 63 c6                movslq %esi,%rax
  400843:       48 63 ff                movslq %edi,%rdi
  400846:       48 0f af c7             imul   %rdi,%rax
  40084a:       48 05 00 80 00 00       add    $0x8000,%rax
  400850:       48 c1 f8 10             sar    $0x10,%rax
  400854:       c3                      retq   

I also just added this version to my test app:

int another (int32_t a, int32_t b) {
    long r = (long)a * (long)b;
    long s = r >> 31;
    return (r + s + 0x8000) >> 16;
}

That results in:

0000000000400760 <another>:
  400760:       48 63 ff                movslq %edi,%rdi
  400763:       48 63 f6                movslq %esi,%rsi
  400766:       48 0f af f7             imul   %rdi,%rsi
  40076a:       48 89 f0                mov    %rsi,%rax
  40076d:       48 c1 f8 1f             sar    $0x1f,%rax
  400771:       48 8d 84 06 00 80 00    lea    0x8000(%rsi,%rax,1),%rax
  400778:       00 
  400779:       48 c1 f8 10             sar    $0x10,%rax
  40077d:       c3                      retq   

Since FT's C version uses longs, though, this:

int another (long a, long b) {
    long r = (long)a * (long)b;
    long s = r >> 31;
    return (r + s + 0x8000) >> 16;
}

gives:

0000000000400760 <another>:
  400760:       48 0f af f7             imul   %rdi,%rsi
  400764:       48 89 f0                mov    %rsi,%rax
  400767:       48 c1 f8 1f             sar    $0x1f,%rax
  40076b:       48 8d 84 06 00 80 00    lea    0x8000(%rsi,%rax,1),%rax
  400772:       00 
  400773:       48 c1 f8 10             sar    $0x10,%rax
  400777:       c3                      retq   

So it would seem that when compiling for any processor where FT_Long is
the same as int64_t and where that fits into a single register, then
that last bit of C might be optimal, yes?

-JimC
-- 
James Cloos <address@hidden>         OpenPGP: 1024D/ED7DAEA6



reply via email to

[Prev in Thread] Current Thread [Next in Thread]