freetype-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Overlap oversampling


From: David Turner
Subject: Re: Overlap oversampling
Date: Tue, 30 Jun 2020 00:58:14 +0200

So, could have a deep look at the patches here. They're pretty neat. I'll just recommend documenting the subtle computations in ft_smooth_slow_spans() a little better, and avoid branches altogether, by using bit twiddling to perform saturated addition instead (removing branches from loops is always best for performance). I.e. something like the following:

  /* This function averages inflated spans in direct rendering mode.
   * It assumes that coverage spans are rendered in a SCALE*SCALE
   * inflated pixel space, and computes the contribution of each
   * span 'sub-pixel' to the target bitmap's pixel. I.e.:
   *
   *  If (x, y) are a pixel coordinates in inflated space, then
   *  (xt := x/SCALE, yt := y/SCALE) are the pixel coordinates in the target
   *  bitmap, where '/' denotes integer division.
   *
   *  Let's define GRIDSIZE := SCALE * SCALE, then if `c` is the 8-bit coverage
   *  for (x, y) in inflated space, then its contribution to (xt, yt) would be
   *  ct := c // GRIDSIZE, where '//' denotes division of real numbers (i.e.
   *  without truncation to a lower fixed or floating point precision).
   *
   *  Since these can only be stored on 8-bit target bitmap pixels, there are
   *  at least two ways to approximate the sum:
   *
   *     1) Compute `ct := FLOOR(c // GRIDSIZE)`, which means that if all
   *        pixels in inflated space have full coverage (i.e. value 255), then
   *        their contribution sums will be GRIDSIZE * FLOOR(255 / GRIDSIZE),
   *        which will be 252 (for SCALE == 2), or 240 (for SCALE == 4).
   *
   *        A later passe will be needed to scale the values to the 0..255
   *        range.
   *
   *     2) Compute `ct := ROUND(c // GRIDSIZE)`, in which case the total
   *        contribution sum may reach 256 for both `SCALE == 2` and
   *        `SCALE == 4`, which cannot be stored in an 8-bit pixel byte of the
   *        target bitmap. To deal with this, perform saturated arithmetic to
   *        ensure that the value never goes over 255. This avoids an
   *        additional rescaling step, and is implemented below.
   */
  static void
  ft_smooth_slow_spans( int             y,
                        int             count,
                        const FT_Span*  spans,
                        TOrigin*        target )
  {
    unsigned char*  dst = target->origin - ( y / SCALE ) * target->pitch;
    unsigned int    x;

    for ( ; count--; spans++ )
    {
      unsigned coverage = (spans->coverage + GRIDSIZE / 2) / GRIDSIZE;


      for ( x = 0; x < spans->len; x++ )
      {
        /* The following performs a saturated addition of d[0] + coverage */
        unsigned char*  d = &dst[(spans->x + x) / SCALE];
        unsigned int  sum = d[0] + coverage;


        d[0] = (FT_Byte)(d | -(sum >> 8));
      }
    }
  }

Here's a Compiler Explorer link that compares the two implementations.


Can you tell me how to actually test that the code works as expected though?

Thanks

- David

Le mar. 23 juin 2020 à 20:16, David Turner <david@freetype.org> a écrit :


Le mar. 23 juin 2020 à 05:42, Alexei Podtelezhnikov <apodtele@gmail.com> a écrit :
Hi again,

The oversampling is implemented though inflating the outline and then
averaging the increased number of cells using FT_RASTER_FLAG_DIRECT
mechanism. The first two patches set the stage by splitting the code
paths for LCD rendering out of the way and trying
FT_RASTER_FLAG_DIRECT for FT_RENDER_MODE_LCD. The third one implements
oversampling by replacing the normal rendering with oversampling if
SCALE is 2 or 4 (as opposed to 1). Again the proposal is to have it as
FT_RENDER_MODE_SLOW eventually. The slightly complicated averaging of
cells is due to 255/4+255/4+255/4+255/4 = 252 instead of 255, so we
have to do rounding, yet avoid overflowing.

Thanks, I'll take a look at your patches.

However, please don't call it FT_RENDER_MODE_SLOW, the fact that it is slow is an implementation detail, and we could very well replace this with a different algorithm in the future (maybe slow, maybe not). So something like FT_RENDER_MODE_OVERLAPPED_OUTLINES seems more appropriate, since it describes why you would want to use this mode, instead of what its performance profile is :-)

Comments?

Alexei

reply via email to

[Prev in Thread] Current Thread [Next in Thread]