Re: Fwd: Re: Inadequate documentation of silly characters on screen.

emacs-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Fwd: Re: Inadequate documentation of silly characters on screen.

From:	Alan Mackenzie
Subject:	Re: Fwd: Re: Inadequate documentation of silly characters on screen.
Date:	Thu, 19 Nov 2009 21:57:07 +0000
User-agent:	Mutt/1.5.9i

Hi, Eli!

On Thu, Nov 19, 2009 at 09:43:29PM +0200, Eli Zaretskii wrote:
> > Date: Thu, 19 Nov 2009 15:58:48 +0000
> > From: Alan Mackenzie <address@hidden>
> > Cc: address@hidden, Andreas Schwab <address@hidden>,
> >     Jason Rumney <address@hidden>

> > > No: the string does not contain any characters, only bytes, because
> > > it's a unibyte string.

> > I'm thinking from the lisp viewpoint.  The string is a data structure
> > which contains characters.  I really don't want to have to think
> > about the difference between "chars" and "bytes" when I'm hacking
> > lisp.  If I do, then the abstraction "string" is broken.

> No, it isn't.  Emacs supports unibyte strings and multibyte strings.
> The latter hold characters, but the former hold raw bytes.  See
> "(elisp) Text Representations".

The abstraction is broken.  It is broken because it isn't abstract - its
users have to think about the way characters are represented.  In an
effective abstraction, a user could just write "ñ" or ?ñ and rely on the
underlying mechanisms to work.

Instead of the abstraction "string", we have two grossly inferior
abstractions, "unibyte string" and "multibyte string".

Please suggest to me the correct elisp to "replace the zeroth character
of an existing string with Spanish n-twiddle".  If this is impossible to
write, or it's grossly larger than the buggy "(aset nl 0 ?ñ)", that's a
demonstration of the breakage.

> > > The byte 241 can be inserted in multibyte strings and buffers
> > > because it is also a char of code 4194289 (which gets displayed as
> > > \361).

> > Hang on a mo'!  How can the byte 241 "be" a char of code 4194289?  This
> > is some strange usage of the word "be" that I wasn't previously aware
> > of.  ;-)

> That's how Emacs 23 represents raw bytes in multibyte buffers and
> strings.

Why is it necessary to distinguish between 'A' and 65?  Surely they're
both just 0x41?  I'm missing something here.

> > At this point, would you please just agree with me that when I do

> >    (setq nl "\n")
> >    (aset nl 0 ?ñ)
> >    (insert nl)

> > , what should appear on the screen should be "ñ", NOT "\361"?

> No, I don't agree.  If you want to get a human-readable text string,
> don't use aset; use string operations instead.

There aren't any.  `store-substring' will fail if the bits-and-bytes
representation of the new bit differ in size from the old bit, thus
surely isn't any better than `aset'.  At least `aset' tries to convert to
multibyte.

I don't imagine anybody here would hold that the current state of strings
is ideal.  I'm still trying to piece together what the essence of the
problem is.

-- 
Alan Mackenzie (Nuremberg, Germany).

[Prev in Thread]

Current Thread

[Next in Thread]

Re: Displaying bytes, (continued)

Prev by Date: Re: Case mapping of sharp s
Next by Date: Re: Fwd: Re: Inadequate documentation of silly characters on screen.
Previous by thread: Re: Fwd: Re: Inadequate documentation of silly characters on screen.
Next by thread: Re: Fwd: Re: Inadequate documentation of silly characters on screen.
Index(es):
- Date
- Thread