Re: 23.0.60; [nxml] BOM and utf-8

emacs-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: 23.0.60; [nxml] BOM and utf-8

From:	David Kastrup
Subject:	Re: 23.0.60; [nxml] BOM and utf-8
Date:	Tue, 20 May 2008 09:13:10 +0200
User-agent:	Gnus/5.13 (Gnus v5.13) Emacs/23.0.60 (gnu/linux)

"Stephen J. Turnbull" <address@hidden> writes:

> David Kastrup writes:
>
>  > I am not interested in the "goal of Unicode" but in that of Emacs.
>  > Unicode is about text files.  But Emacs communicates via byte streams
>  > and those are not necessarily text, or necessarily all text.
>
> Some Emacs files *are* text, and getting them to behave correctly will
> require understanding "the goals of Unicode".  Since Unicode is now
> the underlying representation of multibyte buffers, you don't have a
> choice about this.  Cf. Thomas Morgan's recent post on "disappearing
> cursor".

Sigh.  Bugs are there to be fixed, not to be used as an excuse for more
bugs.  The interpretation of Unicode is a matter of the display engine,
not of the byte stream encoders/decoders.

>  > > Sure, and Emacs must provide coding systems that preserve them,
>  > > and generally use those coding systems by default.  Did anybody
>  > > say otherwise?
>  > 
>  > So what was your point supposed to be?
>
> That Miles could use a BOM-swallowing encoding on input and a non-BOM-
> producing encoding on output to enforce his view of Microsoft
> conventions on others.

I suppose you underestimate Miles here.

>  > So forward-char and replace-string should be made to work as
>  > expected on non-normalized texts.
>
> Good luck.  I don't know how to do that, and doubt that it is
> possible.

We have similar issues with case-folding replacements.  Anyway: one
problem is not an excuse to introduce unrelated bugs elsewhere.  Moving
character unification to a place where it does more damage does not
magically make the problem different.

> I do not think that "as expected" can be well defined, because for
> purposes like computing storage requirements composing characters
> should be considered characters, while for others like computing the
> number of columns occupied by a line they should not.

Again, you are being destructive.  Problems don't present an excuse for
being sloppy.  If one can see a problem that can't be fixed by
principle, then one should try confining it to those operations where it
is inherent instead of spreading its effects all around and making
everything unpredictable.

Yes, there are questions in the presence of composing characters of what
one wants to have forward-char and replace-string and overwrite-mode do.
One reasonable approach is to consider Unicode glyphs as an inseparable
entity with regard to user commands.  It is basically Emacs 20.2 all
over.  But composed Unicode glyphs have no single code points.  They are
vectors.  As long as a character representation as scalar integers
remains valid, Unicode code points is all that we can do.

>  > > Binary faithfulness may be incompatible with other user demands,
>  > > for example if a user introduces Latin-2 characters into a
>  > > Latin-9 text.
>  > 
>  > Why do you think we switched to utf-8 internally and got rid of
>  > latin unification?
>
> David, don't you realize that is not a response to what I wrote?
>
> I think it's time to stop this thread until you address the issues
> instead of me.

Whatever.

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum

[Prev in Thread]

Current Thread

[Next in Thread]

Re: 23.0.60; [nxml] BOM and utf-8, (continued)

Prev by Date: Re: emacs 23.0.60; --without-x should --disable-font-backend
Next by Date: Re: Build error with _UNICODE on w32.
Previous by thread: Re: 23.0.60; [nxml] BOM and utf-8
Next by thread: Re: 23.0.60; [nxml] BOM and utf-8
Index(es):
- Date
- Thread