[ft-devel] fttrigon: Use standard floating-point functions for performan

Hello, freetype-devel; I'm Rodger Combs (rcombs on Foonetic), a member of the libass dev team. For any who are unaware, libass is a subtitle renderer, which uses Freetype2 for a variety of font-related functions; since subtitles are rendered in real-time, it requires high performance both internally and in the libraries it depends on, especially when processing high-complexity fonts and effects on low-powered computers.

While profiling libass in OSX's Instruments.app, I've found that Freetype's stroker tends to come up as a very time-consuming routine, and that most of the time spent in the stroker is in the trigonometry routines; specifically, the atan2 and cosine functions seem rather slow. An example of profiler output in a subtitle track with heavy effects is shown in this screenshot:

Out of curiosity, I examined the functions in question, and tried replacing them with simple 16.16 fixed<->32-bit float conversions and standard C floating-point trigonometry calls, and found that the floating-point versions were about an order of magnitude faster; the trig functions weren't even apparent in the profiler output without filtering for them anymore, and there was no visible degradation in output quality (as I would expect).

As FT_Atan2 and FT_Cos were the only routines I'd noticed in profiler output, I only tried replacing those, but here my the versions of the functions I tested:

#include <math.h>

FT_EXPORT_DEF( FT_Fixed )

FT_Cos( FT_Angle angle )

{

return cosf(angle/65536.0*(M_PI/180.0)) * 65536.0;

}

FT_EXPORT_DEF( FT_Angle )

FT_Atan2( FT_Fixed dx,

FT_Fixed dy )

{

return atan2f(dy/65536.0, dx/65536.0) * (180.0 / M_PI) * 65536.0;

}

I've only performance-tested these two functions using single-precision floating-point replacements on Mac OSX, using a quad-core i7 Haswell, but I'd expect similar performance gains across all x86 processors featuring an x87 FPU or later. I would expect somewhat lower gains using double-precision floating-point math (as x87 doesn't have native instructions for double-precision trigonometry), but I don't think double precision would really be warranted in this case.

While it most likely makes sense to leave the current fixed-point trigonometry in place for the sake of systems with slow (or no) FPU, it seems apparent to me that floating-point versions should be used when compiling for modern processors.

I hope to hear back from this list soon; I can be reached anytime by this email address, or under the nick "rcombs" on the Freenode IRC network; feel free to stop by #libass if you'd like to discuss the performance implications of this suggestion with the rest of the team.

Thanks,

--Rodger Combs

From:	Rodger Combs
Subject:	[ft-devel] fttrigon: Use standard floating-point functions for performance?
Date:	Fri, 23 May 2014 23:49:28 -0500