[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: eight-bit char handling in emacs-unicode

From: Kenichi Handa
Subject: Re: eight-bit char handling in emacs-unicode
Date: Tue, 2 Dec 2003 22:07:43 +0900 (JST)
User-agent: SEMI/1.14.3 (Ushinoya) FLIM/1.14.2 (Yagi-Nishiguchi) APEL/10.2 Emacs/21.3 (sparc-sun-solaris2.6) MULE/5.0 (SAKAKI)

In article <jwvd6b8ttfj.fsf-monnier+emacs/address@hidden>, Stefan Monnier 
<address@hidden> writes:
>>  It is used for not loosing information about text even if
>>  you kill a text in a multibyte buffer and paste it in a
>>  unibyte buffer.

> That's the kind of concrete case I needed, thank you.

I'm very glad that now we can start to argue on the same

> Now I'll have to go back and reread the thread to understand things
> better.


>  Are there other cases like that ?

For instance, on searching a multibyte string in a unibyte
buffer.  But, if we are searching for a regular expression
that contains a character range (e.g. [a-z]), the current
way of simple multibyte->unibyte conversion doesn't work in
many cases.  I fixed it in the unicode branch.

> Also, should we really allow such a thing ?

I myself tend to agree with dropping such a way of unibyte
support, but that should be decided by Richard.

> I mean, it's a dangerous operation since it only works if the user
> is lucky enough to use just the right subset of
> characters.

But, we can expect such a luck in many situations where
people mostly uses only characters belonging to their
primary charset.

> So we should at least signal an error if the conversion is
> unsafe (in that make-string-multibyte will not recover the
> original string).

Shall we test it with HEAD to check how often such an error

> BTW, in which kind of circumstances is the user presented with both
> a multibyte buffer and a unibyte buffer ?

Even if one starts Emacs with --unibyte, emacs sometimes
make a multibyte buffer (e.g. C-h h).  And, even if one
starts Emacs with --multibyte, he may have a file that
contains, for instance, latin-1 characters and raw-byte
data, and he may want to read such a file with the coding
system raw-text (then C-x = always shows \000..\377).

>>  Are you talking about the actual Emacs Lisp codes that
>>  explicitely call make-string-unibyte?  I've been talking
>>  about the functionality of make-string-unibyte itself,
>>  especially about the implicit call to the C function
>>  copy_text that does the same thing as make-string-unibyte.
>>  Is that the reason why it seems that we are talking at corss
>>  purposes.

> I'm talking about both.

> I agree on the signalling, of course, I just want to push it further
> and signal even when pasting latin-2 multibyte text into a unibyte buffer.
> After all, why should Slovak users be able to do that but Japanese users
> not ?  In my view, everytime we use this kind of thing, we're taking
> a temporary shortcut that is "good enough for 8bit users" but not for the
> rest of the world.

The fact that something doesn't work for double-byte charset
users can't be a reason strong enough for dropping it for
single-byte charset users.

> AFAIK, unibyte buffers should only be used internally and never presented
> to the user.  This is because unibyte buffers contain bytes (in my view)
> whereas the user wants to see characters.

I agree that is a very clean view, and I myself expressed
the same thing several times.  But, it seems that Richard
doesn't want to drop the current way of unibyte support.

Ken'ichi HANDA

reply via email to

[Prev in Thread] Current Thread [Next in Thread]