groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Groff] Formatting algorithm, an experiment


From: Ulrich Lauther
Subject: Re: [Groff] Formatting algorithm, an experiment
Date: Wed, 28 May 2014 13:49:58 +0200
User-agent: Mutt/1.5.23 (2014-03-12)

On Mon, Mar 31, 2014 at 07:44:07PM -0400, Peter Schaffter wrote:
> Here's the bare bones version of the algorithm I was thinking of
> when I proposed improving line formatting by getting groff to
> shoulder the burden for some of the work we do manually.  It's
> written out in brute-force pseudo-code; should be pretty clear.
> 
To get a feeling for formating alternatives I implemented
a simplified version of the Knuth-Plass algorithm.

The idea is, to pack charakters as dense as possible
and to minimize the maximum gap at the end of any line.
In a second pass intra- and interword spacing could then be adjusted
along the lines of the heuristic proposed by Peter Schaffter.

To make tests realistic I implemented a primitive hyphenation algorithm
that sometimes works :-)
Character widths were taken from /usr/share/groff/1.21/font/devps/HR

For speed up, I first use a greedy algorithm and employ its result
as an upper bound for any gap in the dynamic programming solution.
However, the speed-up achieved that way seems to be minimal.

I tested with an English paragraph containing 775 words (before
hyphenation): "In olden times, when wishing still did some good,
there lived a king whose daughters were all beautiful, ...".

This is a stand-alone program, just for a feasibility test.
It reads a text (which is considered one paragraph), and runs the 
formatter.

Results:
--------

Lines of code 370 plus 184 for my linked list class.
The core algorithm takes just 95 lines of code.

Break-objects generated: 823 or 0.9 per hyphenated syllable.
After formatting, these objects go into a list of garbage and can
be reused for the next paragraph.

Running time under Ubuntu on Intel(R) Atom(TM) CPU N455 @ 1.66GHz:
Less than one millisecond when compiled with g++ -O4.

I used a line length of 100 times the width of the space-character.

The maximum gap at the end of any lines was:

greedy: 10.6 spaces
KP:      7.1 spaces

I think, the experiment shows that running time is no issue at all.

Kind regards,

     ulrich



reply via email to

[Prev in Thread] Current Thread [Next in Thread]