[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: PDF outline not capturing Cyrillic text

From: Ralph Corderoy
Subject: Re: PDF outline not capturing Cyrillic text
Date: Tue, 20 Sep 2022 11:38:48 +0100

Hi Branden,

> A shorter pole might be to establish a protocol for communication of
> Unicode code points within device control commands.  Portability isn't
> much of an issue here: as far as I know there has been no effort to
> achieve interoperation of device control escape sequences among
> troffs.
> That convention even _could_ be UTF-8, but my initial instinct is
> _not_ to go that way.  I like the 7-bit cleanliness of GNU troff
> output, and when I've mused about solving The Big Unicode Problem
> I have given strong consideration to preserving it, or enabling
> tricked-out UTF-8 "grout" only via an option for the kids who really
> like to watch their chrome rims spin.

Adding an option seems more needless complexity.
I am not a kid and have never had chrome rims.

> I realize that Heirloom and neatroff can both boast of this

I expect they just think it mundane.

> but how many people _really_ look at device-independent troff output?
> A few curious people, and the poor saps who are stuck developing and
> debugging the implementations, like me.  For the latter community,
> a modest and well-behaved format saves a lot of time.

I read it, diff(1) it, etc.  Skipping the device-specific rendering
often simplifies the comparison and removes another layer of potential
mud and error.

There's nothing great about the device-independent format being ASCII.
I strongly suggest using UTF-8 encoding for the Unicode runes that need
passing through to the device driver.  This will continue to make it
easy to read, grep, etc., and avoid yet another encoding format because
none of the existing ones are ‘good enough’.  The device drivers will
probably have UTF-8 parsing code to hand.

If groff ever reaches ‘UTF-8 everywhere’, an ad-hoc encoding for this
one thing will appear to be an anachronism when it is really a poor
recent decision.

Cheers, Ralph.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]