[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: eight-bit char handling in emacs-unicode

From: Stefan Monnier
Subject: Re: eight-bit char handling in emacs-unicode
Date: 25 Nov 2003 10:43:05 -0500
User-agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.3.50

> It seems that you keep of saying that "A does B, thus it's
> nonsense".  But, I'm arguing that "A does C".

Well, the thing is: I still don't understand what is C.
>From what I understand, you say that C is "a conversion from multibyte
to a sequence of code-points", but since the output is a unibyte string,
that restrict it to cases where the code-points can be encoded in 8 bits,
thus it doesn't sound very generic and I don't see any application for it
(nor do I see any practical difference with using encode-coding-string
since the output AFAIK would be the same).

> It doesn't make sense because you treat the result as "a
> unibyte string encoded in Latin-1".

> It makes sense if you treat the result as "a unibyte string
> in which each byte represents a sequence of Unicode
> code-points", doesn't it?

But each byte can only represent the 0-255 subset of unicode code-points, in
which case this is equivalent (practically speaking) to latin-1, isn't it ?

>> It'd make sense if the environment said "latin-1 when you can,
>> utf-8 otherwise" or something like that, but then we would use
>> encode-coding-string anyway.

> It's itself nonsense to have such a coding system.

I was not thinking of a coding-system, but just some encoding job,
such as what is done when saving a buffer (where my .emacs does exactly
that: try latin-1 first and utf-8 if that fails).

> Do you agree with having string-make-unibyte if it signals an error on
> non-Latin-1 characters?

Of course: that's pretty much what I suggested: make-string-unibyte only
accepts multibyte chars that correspond to "bytes".

>> I just don't know of a concrete case where it makes sense to use
>> string-make-unibyte.

> I'll paraphrase my previous example as this:

>   It is perfectly possible to live in such an environment
>   where only the characters U+0000..U+00FF of Unicode is
>   used but only the coding system utf-8 is used.

> But, I don't claim that the above is a realistic case.

> Another non-realistic but concrete case is:

>   Use only the charset iso-8859-5 and the encoding CTEXT.

I don't see any use of string-make-unibyte in your two examples.
And "having string-make-unibyte if it signals an error on non-Latin-1
characters" means that the second example can't be used any more.

        Stefan "still in the dark"

reply via email to

[Prev in Thread] Current Thread [Next in Thread]