[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Groff] Re: non-ASCII chars and grohtml
From: |
Gaius Mulley |
Subject: |
[Groff] Re: non-ASCII chars and grohtml |
Date: |
24 Nov 2004 12:06:41 +0000 |
User-agent: |
Gnus/5.09 (Gnus v5.9.0) Emacs/21.2 |
Werner LEMBERG <address@hidden> writes:
> Gaius,
>
>
> if I say
>
> \X'html:ü'
>
> I get
>
> x X html:ü
>
> in the intermediate troff output file. With other words, the \X
> escape passes `ü' unmodified. This is a problem, since grohtml
> expects ASCII input only. We have no possibility in GNU troff to
> convert `ü' to `\[:u]' in the `mouth' (to use TeX's terminology), so I
> suggest that you add a warning to grohtml, something like this:
>
> Charset `US-ASCII' doesn't contain character code 0xFC (`ü')
ok
> Additionally, we need a new tag `html:charset' which sets the
> `charset' attribute in the <meta> command. Then a string
> `.input-encoding' (the leading dot shall indicate that this string is
> meant as read-only) should be added to the latinX.tmac files which can
> be used in www.tmac to set the tag automatically:
>
> .tag "html:charset \*[.input-encoding]
>
> The whole issue is a bit tricky; for example, I suggest to allow at
> most one call to `.tag html:charset...' for simplicity. Another
> problem is how to determine the valid character ranges -- shall this
> be built into grohtml? Or shall my proposed html:charset tag look
> like this:
>
> html:charset <name> <start1> <end1> <start2> <end2> ...
>
> so that grohtml can be dumb, and the latinX.tmac define the proper
> ranges via \*[.input-encoding]?
this is certainly a good idea. Grohtml would still have to check the
ranges of legal characters though, but this is easy - just not quite
as easy as testing for ch < 0x80 :-)
> Of course, the simplest solution is to disallow characters >= 0x80
> completely in the `html:...' tag, but a user may wonder why she can
> use `ü' everywhere in the document except in .URL and friends (and
> switching to UTF8 in the future needs additional changes).
yes I think your html:charset method outlined above is the way to
go..
Gaius