[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Groff] Re: man page encoding

From: Andries Brouwer
Subject: Re: [Groff] Re: man page encoding
Date: Thu, 7 Jul 2005 22:26:50 +0200
User-agent: Mutt/1.4i

On Thu, Jul 07, 2005 at 09:40:34PM +0200, Bruno Haible wrote:

> Andries Brouwer wrote:
> > If there is a pipeline, then earlier
> > stages in the pipeline already need the character set.
> > So, conversion may have to be done before the input reaches groff.

> Btw, if a program in the pipeline, before groff, actually needs the
> character set, it will be able to infer it from the "coding:" marker.
> Whereas in the past, without a marker, it cannot know whether it's processing
> something in KOI8-R or ISO-8859-5.
> > And that also brings up a different point. If I have a file
> > that has topline -*- coding: EUC-JP -*- and I feed it to
> > a program like iconv, must that program change the topline?
> The "gpreconv" filter must be idempotent:
>          gpreconv | gpreconv  ==  gpreconv.
> Whether it achieves this by converting the input to UTF-8 and changing the
> marker to "coding: utf-8", or whether it converts the input to ASCII with
> lots of \[...] or \N[...] escape sequences and leaves the marker in place,
> is an unimportant detail.

So - we now get a new converter, not iconv, but a special-purpose gpreconv
filter. It knows that it is converting things that will later be fed to groff.
Pity. Where is my beautiful Unix?

This converter may change the sequence of symbols in the file, not only
the representation of these symbols. Ach.

It is not at all an unimportant detail whether it changes to utf-8 or
ascii with escape sequences. My own preprocessors halfway that pipeline
do not know about utf-8, and do not know about these escape sequences
either. Still I am told that compatibility mode should work.
Does gpreconv also know whether groff will be called with the -C option?

I foresee a complicated mess.

Is it not far simpler to document that groff must be called with a file
coded in ASCII or Latin-1 or UTF-8?


reply via email to

[Prev in Thread] Current Thread [Next in Thread]