[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: decode-coding-string gone awry?

From: Kenichi Handa
Subject: Re: decode-coding-string gone awry?
Date: Thu, 17 Feb 2005 21:08:02 +0900 (JST)
User-agent: SEMI/1.14.3 (Ushinoya) FLIM/1.14.2 (Yagi-Nishiguchi) APEL/10.2 Emacs/21.3.50 (sparc-sun-solaris2.6) MULE/5.0 (SAKAKI)

In article <address@hidden>, Stefan Monnier <address@hidden> writes:

>>  Is it reasonable to operate with decode-coding-string on a multibyte
>>  string?  If that is nonsense, maybe we should make it get an error,
>>  to help people debug such problems.

> I think it would indeed make sense to signal errors when decoding
> a multibyte string or when encoding a unibyte string.

>>  If there are some few cases where decode-coding-string makes sense on
>>  a multibyte string, maybe we can make it get an error except in those
>>  few cases.

> The problem I suspect is that it's pretty common for ASCII-only strings to
> be arbitrarily marked unibyte or multibyte depending on the circumstance.
> So we would have to check for the case where the string is ASCII-only before
> signalling an error.

> I'm actually running right now with an Emacs that does signal such errors.
> I've changed the notion of "multibyte/unibyte" string by saying:
> - [same as now] if size_byte < 0, it's UNIBYTE.
> - [same as now] if size_byte > size, it's MULTIBYTE.
> - [changed]     if size_byte == size, it's neither/both (ASCII-only).

> Then I've changed several parts of the C code to try and set size_byte==size
> whenever possible (instead of marking the string as unibyte).

Even if size_byte == size, it may contain eight-bit-graphic
characters, and decoding such a string is a valid operation.
And even if size_byte > size, it may contain only ASCII,
eight-bit-graphic, and eight-bit-control charactes.  It's
also a valid operation to decode it.

It's not a trivial work to change the current code (in
coding.c) to signal an error safely while doing a code
conversion.  So, to check if decoding is valid or not, we
have to check all characters in a string in advance, which,
I think, slows down the operation considerably.

Ken'ichi HANDA

reply via email to

[Prev in Thread] Current Thread [Next in Thread]