[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: 23.0.60; [nxml] BOM and utf-8

From: Stephen J. Turnbull
Subject: Re: 23.0.60; [nxml] BOM and utf-8
Date: Tue, 20 May 2008 05:34:45 +0900

David Kastrup writes:
 > "Stephen J. Turnbull" <address@hidden> writes:

 > > In any case, maintaining faithfulness of representation is simply not
 > > possible, as you point out
 > With some coding systems.  But the latin-* and utf-* can maintain the
 > binary stream since their coding is required to be canonical in the
 > standard.

latin-* will do so because of their extremely limited range.  It's
unfortunate that programmer intuitions about text have been
Americanized (== drastically limited) by these encodings.

utf-* can maintain representation in the very limited sense you have
in mind, and I know that is very useful to you in dealing with non-
conforming applications like TeX.  However, you still run into the
problem that faithfulness of representation is not a goal of Unicode.

 > > It's also not at all obvious that that is a very
 > > useful requirement when dealing with a character-oriented standard
 > > like Unicode or XML, since you can expect many applications to
 > > canonicalize the text "behind your back".
 > That's not an issue.

What do you mean by "that's not an issue?"  How can you know when I
haven't named the application?

 > Also you can load, edit and save a text file in colloborative
 > environments, and the diffs/patches will be just in the edited areas
 > (this will supposedly work better with Emacs-23 than Emacs-22).  Those
 > are quite important features.

Sure, and Emacs must provide coding systems that preserve them, and
generally use those coding systems by default.  Did anybody say

 > > Users should get used to it, and we should document how to force Emacs
 > > to error rather than do anything behind your back for those who need
 > > binary faithfulness rather than text faithfulness.
 > Since binary faithfulness implies text faithfulness, there is no reason
 > not to the right thing instead of erroring out.

"There is no reason"?  How arrogant of you!  Rather, "David Kastrup
lacks the knowledge of the reasons."  Here are three examples:

Binary faithfulness may imply breaking text programs.  For example,
`forward-char' and `replace-string' will give surprising results in a
buffer using Unicode internally that contains Unicode in NFD
normalization (and these anomolies will be noticeable in all Western
European languages excluding English).  Binary faithfulness may imply
inefficiency.  For example, files need not be normalized, which would
imply keeping a copy of the whole file and doing a Unicode diff to
determine which parts of the file need to be saved from the buffer and
which parts from the saved copy.  Binary faithfulness may be
incompatible with other user demands, for example if a user introduces
Latin-2 characters into a Latin-9 text.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]