[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

How to create a derived encoding?

From: David Kastrup
Subject: How to create a derived encoding?
Date: Tue, 12 Oct 2004 02:10:00 +0200
User-agent: Gnus/5.11 (Gnus v5.11) Emacs/21.3.50 (gnu/linux)

After considerable thinking about the problem, I have arrived at the
conclusion that for efficiency's sake I'd like to have an encoding
like tex-utf-8 which is derived from the normal utf-8 except that
sequences like ^^8a and similar are converted into a corresponding
byte before combining Unicode characters.  It would be a bonus if such
sequences staid unchanged in case that this sort of composition does
not lead to a valid Unicode character, but that's just a bonus.

The problem is that TeX has no clue about _characters_, but works on
byte streams, and it has the habit of transliterating some byte codes
in the above manner.  Treating the output of TeX sensibly means
converting those transliteration back into bytes _before_ assembling
Unicode characters.

The same problem occurs with unibyte non-ASCII encodings by Latin-1.
I already have one (rather inefficient) hack to deal with that in
preview-latex, but it does not extend easily to multibyte.

So if there was a tolerably working way to derive a special encoding
(which will be used as a process output encoding) that reconverts
control sequences like the above before composing unicode characters
from the resulting utf-8 stream, this would appear to be by far the
fastest and convenient way to go about this problem.

Any hints how to derive a suitably augmented encoding from an existing

David Kastrup, Kriemhildstr. 15, 44793 Bochum

reply via email to

[Prev in Thread] Current Thread [Next in Thread]