Re: Failure to render utf8 characters when sourced

From: G. Branden Robinson
Subject: Re: Failure to render utf8 characters when sourced
Date: Fri, 26 Aug 2022 11:01:19 -0500

[looping in groff list, since bug-groff isn't really for discussion]

Hi Pippo,

At 2022-08-26T12:07:17+0800, Pippo Carmona wrote:
> Greetings!
> Since the Refer csl cannot be changed,

I'm not precisely sure what you mean by the "csl" here, but I think I
grasp the contours of your problem.

> I used the .ds macro to supply my footnotes and bibliography with the
> formatted entries that fit my specification. However, if the .ds
> macros are sourced from a separate file using .so, some characters are
> rendered incorrectly. For example, é becomes é. And when I set the
> macro in the same document, it is rendered correctly.
> I have used -k and preconv to try solve the issue, but it just doesn't
> work.  Is there a workaround that I need to do, or is this a bug?

I think you are hitting a known limitation of preconv.  Here is some
language from the version of the man page in groff Git.

       preconv cannot perform any transformation on input that it cannot
       see.  Examples include files that are interpolated by
       preprocessors that run subsequently, including soelim(1); files
       included by troff itself through “so” and similar requests; and
       string definitions passed to troff through its -d command‐line

There are multiple workarounds.  Bjarni offered one.

At 2022-08-26T14:46:48+0000, wrote:
> This looks like a case for bug #59442.  a) Use the option '-V' for
> "groff"  to see what the pipeline is  b) Reconstruct it to put
> "soelim" first.  Add the option '-e <encoding>' to the "preconv"
> command.

Another approach would be to convert the file you're sourcing to be
groff-friendly input on disk.

So instead of a UTF-8 encoded file like this:

.ds Gassee Jean-Louis Gassée\"

You might have:

.ds Gassee Jean-Louis Gass\['e]e\"

Some day I'd like to extend preconv(1) to accept options to produce
input that is more user-friendly and maintainable than the Unicode code
point escape sequences that it produces now, which look like this.

.ds Gassee Jean-Louis Gass\[u00C3]\[u00A9]e\"

You can read more about these issues in the groff_char(7) man page; I
recommend the version from groff Git; it has been considerably
expanded and clarified since the 1.22.4 release.


