|
From: | Rodger Combs |
Subject: | [ft-devel] fttrigon: Use standard floating-point functions for performance? |
Date: | Fri, 23 May 2014 23:49:28 -0500 |
Hello, freetype-devel; I'm Rodger Combs (rcombs on Foonetic), a member of the libass dev team. For any who are unaware, libass is a subtitle renderer, which uses Freetype2 for a variety of font-related functions; since subtitles are rendered in real-time, it requires high performance both internally and in the libraries it depends on, especially when processing high-complexity fonts and effects on low-powered computers. While profiling libass in OSX's Instruments.app, I've found that Freetype's stroker tends to come up as a very time-consuming routine, and that most of the time spent in the stroker is in the trigonometry routines; specifically, the atan2 and cosine functions seem rather slow. An example of profiler output in a subtitle track with heavy effects is shown in this screenshot: ![]() Out of curiosity, I examined the functions in question, and tried replacing them with simple 16.16 fixed<->32-bit float conversions and standard C floating-point trigonometry calls, and found that the floating-point versions were about an order of magnitude faster; the trig functions weren't even apparent in the profiler output without filtering for them anymore, and there was no visible degradation in output quality (as I would expect). As FT_Atan2 and FT_Cos were the only routines I'd noticed in profiler output, I only tried replacing those, but here my the versions of the functions I tested: #include <math.h> FT_EXPORT_DEF( FT_Fixed ) FT_Cos( FT_Angle angle ) { return cosf(angle/65536.0*(M_PI/180.0)) * 65536.0; } FT_EXPORT_DEF( FT_Angle ) FT_Atan2( FT_Fixed dx, FT_Fixed dy ) { return atan2f(dy/65536.0, dx/65536.0) * (180.0 / M_PI) * 65536.0; } I've only performance-tested these two functions using single-precision floating-point replacements on Mac OSX, using a quad-core i7 Haswell, but I'd expect similar performance gains across all x86 processors featuring an x87 FPU or later. I would expect somewhat lower gains using double-precision floating-point math (as x87 doesn't have native instructions for double-precision trigonometry), but I don't think double precision would really be warranted in this case. While it most likely makes sense to leave the current fixed-point trigonometry in place for the sake of systems with slow (or no) FPU, it seems apparent to me that floating-point versions should be used when compiling for modern processors. I hope to hear back from this list soon; I can be reached anytime by this email address, or under the nick "rcombs" on the Freenode IRC network; feel free to stop by #libass if you'd like to discuss the performance implications of this suggestion with the rest of the team. Thanks, --Rodger Combs |
signature.asc
Description: Message signed with OpenPGP using GPGMail
[Prev in Thread] | Current Thread | [Next in Thread] |