[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: setenv -> locale-coding-system cannot handle ASCII?!

From: Miles Bader
Subject: Re: setenv -> locale-coding-system cannot handle ASCII?!
Date: 04 Mar 2003 11:48:57 +0900

Richard Stallman <address@hidden> writes:
>     a buffer/string's should have an associated `unibyte encoding'
>     attribute, which would allow it to be encoded using the
>     straightforward and efficient `unibyte representation' but appear
>     to lisp/whoweve as being a multibyte buffer/string (all of who's
>     characters happen to have the same charset).
> This is more or less what a unibyte buffer is now, except that there
> is only one possibility for which character sets can be stored in it:
> it holds the character codes from 0 to 0377.

Yeah, but I'm saying that emacs should be able to use this efficient
representation for other character sets as well -- I think it's far more
common to have buffers storing non-raw 8-bit characters than raw
characters, so why is the uncommon case optimized?

> If we wanted to hide from the user the distinction between unibyte and
> multibyte buffers, we would have to change the buffer's representation
> automatically when inserting characters that don't fit unibyte.  That
> seems like a bad idea.

Well I agree that it would be annoying if your 10-megabyte raw-bytes buffer
suddenly got converted because you accidentally inserted a chinese
character. :-)

However I think that in many cases such a conversion would be OK, and
since 99% of the time, people _don't_ mix character sets, it would
probably be a win on average.

Maybe there could be a buffer-local variable that `locks' the buffer's
character set, and would cause an error to be signalled if some code
attempts to insert non-compatible text (instead of converting the
buffer)?  This might better catch errors in coding than current
`just insert the raw-codes' unibyte buffers (if you _really_ want to
insert the raw-codes, you can of course do so explicitly.

> The advantage of unibyte mode for some European Latin-N users is that
> they don't have to deal with encoding and decoding, so they never have
> to specify a coding system.  It is possible that today we could get
> the same results using multibyte buffers and forcing use of a specific
> Latin-N coding system.  People could try experimenting with this and
> seeing if it provides results that are just like what European users
> now get with unibyte mode.

Perhaps the same advantages could be had, without making a special case,
by having a `uninterpreted' character set, which would effectively be
treated by the display code as `just send whatever code raw to the terminal.'

> As for the idea that efficiency should never be a factor in deciding
> what to do here, I am skeptical of that.

I'm not saying that efficiency isn't an issue, I'm saying that lisp
programmers shouldn't have to worry about it as much.  They should be
able to just use `normal' coding methods (which currently means
multibyte by default), and expect that emacs would optimize this in
certain common cases; currently if lisp programmer wants extra
efficiency, he's got to use special and more dangerous operations.

I realize that what I'm suggesting is a bit much, at least for the near
future, but I also think the current design is somewhat broken, and
makes it too easy for programmers to do the wrong thing.

Ich bin ein Virus. Mach' mit und kopiere mich in Deine .signature.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]