|Subject:||[ft-devel] fttrigon: Use standard floating-point functions for performance?|
|Date:||Fri, 23 May 2014 23:49:28 -0500|
Hello, freetype-devel; I'm Rodger Combs (rcombs on Foonetic), a member of the libass dev team. For any who are unaware, libass is a subtitle renderer, which uses Freetype2 for a variety of font-related functions; since subtitles are rendered in real-time, it requires high performance both internally and in the libraries it depends on, especially when processing high-complexity fonts and effects on low-powered computers.
While profiling libass in OSX's Instruments.app, I've found that Freetype's stroker tends to come up as a very time-consuming routine, and that most of the time spent in the stroker is in the trigonometry routines; specifically, the atan2 and cosine functions seem rather slow. An example of profiler output in a subtitle track with heavy effects is shown in this screenshot:
Out of curiosity, I examined the functions in question, and tried replacing them with simple 16.16 fixed<->32-bit float conversions and standard C floating-point trigonometry calls, and found that the floating-point versions were about an order of magnitude faster; the trig functions weren't even apparent in the profiler output without filtering for them anymore, and there was no visible degradation in output quality (as I would expect).
As FT_Atan2 and FT_Cos were the only routines I'd noticed in profiler output, I only tried replacing those, but here my the versions of the functions I tested:
FT_EXPORT_DEF( FT_Fixed )
FT_Cos( FT_Angle angle )
return cosf(angle/65536.0*(M_PI/180.0)) * 65536.0;
FT_EXPORT_DEF( FT_Angle )
FT_Atan2( FT_Fixed dx,
FT_Fixed dy )
return atan2f(dy/65536.0, dx/65536.0) * (180.0 / M_PI) * 65536.0;
I've only performance-tested these two functions using single-precision floating-point replacements on Mac OSX, using a quad-core i7 Haswell, but I'd expect similar performance gains across all x86 processors featuring an x87 FPU or later. I would expect somewhat lower gains using double-precision floating-point math (as x87 doesn't have native instructions for double-precision trigonometry), but I don't think double precision would really be warranted in this case.
While it most likely makes sense to leave the current fixed-point trigonometry in place for the sake of systems with slow (or no) FPU, it seems apparent to me that floating-point versions should be used when compiling for modern processors.
I hope to hear back from this list soon; I can be reached anytime by this email address, or under the nick "rcombs" on the Freenode IRC network; feel free to stop by #libass if you'd like to discuss the performance implications of this suggestion with the rest of the team.
Description: Message signed with OpenPGP using GPGMail
|[Prev in Thread]||Current Thread||[Next in Thread]|