[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[ft-devel] fttrigon: Use standard floating-point functions for performan

From: Rodger Combs
Subject: [ft-devel] fttrigon: Use standard floating-point functions for performance?
Date: Fri, 23 May 2014 23:49:28 -0500

Hello, freetype-devel; I'm Rodger Combs (rcombs on Foonetic), a member of the libass dev team. For any who are unaware, libass is a subtitle renderer, which uses Freetype2 for a variety of font-related functions; since subtitles are rendered in real-time, it requires high performance both internally and in the libraries it depends on, especially when processing high-complexity fonts and effects on low-powered computers.
While profiling libass in OSX's, I've found that Freetype's stroker tends to come up as a very time-consuming routine, and that most of the time spent in the stroker is in the trigonometry routines; specifically, the atan2 and cosine functions seem rather slow. An example of profiler output in a subtitle track with heavy effects is shown in this screenshot:
Out of curiosity, I examined the functions in question, and tried replacing them with simple 16.16 fixed<->32-bit float conversions and standard C floating-point trigonometry calls, and found that the floating-point versions were about an order of magnitude faster; the trig functions weren't even apparent in the profiler output without filtering for them anymore, and there was no visible degradation in output quality (as I would expect).
As FT_Atan2 and FT_Cos were the only routines I'd noticed in profiler output, I only tried replacing those, but here my the versions of the functions I tested:

#include <math.h>

  FT_Cos( FT_Angle  angle )
    return cosf(angle/65536.0*(M_PI/180.0)) * 65536.0;

  FT_Atan2( FT_Fixed  dx,
            FT_Fixed  dy )
    return atan2f(dy/65536.0, dx/65536.0) * (180.0 / M_PI) * 65536.0;

I've only performance-tested these two functions using single-precision floating-point replacements on Mac OSX, using a quad-core i7 Haswell, but I'd expect similar performance gains across all x86 processors featuring an x87 FPU or later. I would expect somewhat lower gains using double-precision floating-point math (as x87 doesn't have native instructions for double-precision trigonometry), but I don't think double precision would really be warranted in this case.

While it most likely makes sense to leave the current fixed-point trigonometry in place for the sake of systems with slow (or no) FPU, it seems apparent to me that floating-point versions should be used when compiling for modern processors.

I hope to hear back from this list soon; I can be reached anytime by this email address, or under the nick "rcombs" on the Freenode IRC network; feel free to stop by #libass if you'd like to discuss the performance implications of this suggestion with the rest of the team.

--Rodger Combs

Attachment: signature.asc
Description: Message signed with OpenPGP using GPGMail

reply via email to

[Prev in Thread] Current Thread [Next in Thread]