[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Emacs 23 character code space

From: Eli Zaretskii
Subject: Re: Emacs 23 character code space
Date: Sun, 23 Nov 2008 06:22:45 -0500

> From: Stefan Monnier <address@hidden>
> Date: Sat, 22 Nov 2008 23:16:49 -0500
> Cc: address@hidden, Kenichi Handa <address@hidden>
> I think we should state somewhere that unibyte strings and buffers
> contain bytes only.  And that multibyte strings and buffers contain
> chars.  And that bytes are a subset of chars.

Please take a look at the current version of nonascii.texi in CVS, I
already did state this.  Specific suggestions for improvement are
welcome, of course.

(The text I was quoting was the original one written by Handa-san, not
the one I put into the manual.)

> >     @defun string-to-multibyte string
> >     This function returns a multibyte string containing the same sequence
> >     of characters as @var{string}.  If @var{string} is a multibyte string,
> >     it is returned unchanged.
> >     @end defun
> > I'm not sure I understand the effect of this function.
> It returns a string containing the same bytes (in the sense of
> ASCII+eight-bit, not in the sense of the underlying internal
> representation, which we should as much as possible not mention
> anywhere) but in a multibyte string instead.  I.e. the output is
> a multibyte string of the same length whose chars are bytes.

So you are in effect saying that the effect of this function is only
well defined for a string that holds ASCII characters and raw 8-bit

> >     @defun string-to-unibyte string
> >     This function returns a unibyte string containing the same sequence of
> >     characters as @var{string}.  It signals an error if @var{string}
> >     contains a address@hidden character.  If @var{string} is a
> >     unibyte string, it is returned unchanged.
> >     @end defun
> > Since this function handles any non-ASCII characters lossily, when
> > would it be useful?
> I think the "non-ASCII" part is incorrect.  It probably should say
> "non-byte char" instead.

"Non-ASCII characters" here does not mean "anything but ASCII
characters", it means "any character except ASCII and raw 8-bit
bytes" (assuming I understand the text correctly).  I will make sure
this tricky distinction is clear in the manual.

> In 99% (actually 99.99999% for the `as' case) of the cases you shouldn't
> use string-{as/make/to}-{uni/multi}byte.  Instead you should use
> {en/de}code-coding-string.

This specific section is not about en/decoding text, it's about
converting between unibyte and multibyte.  Unless we want to remove
any mention of these capabilities (and leave Lisp programmers without
any documentation on how to handle binary data and/or byte streams of
undecoded text), I don't think we can remove the description of these
functions from the manual.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]