[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: setenv -> locale-coding-system cannot handle ASCII?!

From: Kenichi Handa
Subject: Re: setenv -> locale-coding-system cannot handle ASCII?!
Date: Wed, 26 Feb 2003 16:49:15 +0900 (JST)
User-agent: SEMI/1.14.3 (Ushinoya) FLIM/1.14.2 (Yagi-Nishiguchi) APEL/10.2 Emacs/21.2.92 (sparc-sun-solaris2.6) MULE/5.0 (SAKAKI)

In article <address@hidden>, "Stefan Monnier" <monnier+gnu/address@hidden> 
>>  Why is it not needed?  Strings and buffers are not that
>>  different, both are containers of characters.

> They are used differently.  Operations on strings generally apply to the
> whole string: you can only encode/decode a whole string at a time.

That's because of the limitation of the current
implementation, not because of the nature of strings.
There's no reason for keeping that limitation.  Actually, as
we have changed the type Lisp_String in 21.1, it's not
difficult to make strings change length.

>>  If we get a unibyte string from a unibyte buffer by buffer-substring,
>>  how should we treat that string?

> Like any other unibyte string: as a sequence of raw bytes.
> If you want to treat it as a sequence of characters, then
> you need to pass it through `string-as-multibyte'.

If we regard that limitation as a nature of strings, your
idea is worth considering.  It seems that we can at least
construct a consistent explanation about its behaviour based
on your idea too.

What a character in a unibyte buffer represents depends on a
context.  It may be a character represented by a single
byte, or a raw byte not yet decoded, or a byte constituing a
multibyte form of the different character.

On the other hand, a character in a unibyte string always
represents a raw byte.  Emacs coerces it into a character
represented by that single byte when a unibyte string is
concatenated with a multibyte string, or it is inserted in a
multibyte buffer.

But, I'm not sure such a change is really necessary.  Are
you sure that the change doesn't break the current usage of
unibyte strings?

>>  The latter yields multibyte, but I think it'a bug.  I found
>>  that "(format "%s" 1)" is implemented by using
>>  prin1-to-string, and prin1-to-string prints an object to a
>>  temporary buffer and gets that buffer string.  So, in a
>>  multibyte sesstion "(format "%s" 1)" yields a multibyte
>>  string.  :-(

> I know: I bumped into it yesterday while playing around with tar-mode.
> How about the attached patch ?

Please see the comments below.

>>  So, do you mean that you want this?
>>      If a unibyte buffer has \201\300 in the region FROM and TO,
>>      (encode-coding-string (buffer-substring FROM TO) 'iso-latin-1)
>>      => "\201\300"
>>      (encode-coding-region FROM TO 'iso-latin-1) changes the
>>      region to \300.

> Yes, I guess I'd be happy with it.

>>  Isn't it more confusing?

> Not to me.

What do the other people think about it?

> PS: I wish there was a way to swap two buffers's content so that
>     tar-mode could swap the (potentially very large) data to
>     a helper buffer (without needing to copy this large data)
>     and then use multibyte for the display and unibyte for
>     the helper buffer.

I don't understand what you mean, especially the usage of
the helper buffer.

I think tar-mode should use multiple buffers, one unibyte
buffer for tar-file itself, one multibyte buffer for table
of contents, and the other multibyte buffers (created on
demand) for viewing/editing files contained in the tar-file.
Then, tar mode works almost the same way as dired.  We can
see multibyte files in the different buffers.  We can use
the same method in arc-mode and also in RMAIL.

Is that different from what you mean?

Ken'ichi HANDA

reply via email to

[Prev in Thread] Current Thread [Next in Thread]