[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#11073: 24.0.94; BIDI-related crash in redisplay with certain byte se

From: Stefan Monnier
Subject: bug#11073: 24.0.94; BIDI-related crash in redisplay with certain byte sequences
Date: Tue, 03 Apr 2012 00:22:32 -0400
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/24.0.94 (gnu/linux)

>> > Usually, yes.  But as far as there is a code space in high
>> > area for a CJK charset, it is unavoidable to have a
>> > buffer/string that contains a character represented by a
>> > byte sequence in that high area as the test case of
>> > Bug#11073.  And, as "unification" means to treat such a
>> > character the same way as the unified character, I thought
>> > they both have the same character code.

>> Since there are two internal byte-sequence representation, I don't see
>> any good reason why we shouldn't have 2 internal int representations.
>> I.e. if unification failed for the byte-sequence (which might be the
>> result of a bug, for all I know), we may as well keep them non-unified
>> in the int representation.

> Please note that not all characters in the code-space of a
> CJK charset are unified.  For instance, Big5 has it's own
> PUA (private use area), and characters in PUA are not
> unified by default.  So, if Emacs reads a Big5 file that
> contains PUA chars, those chars stay in high-area.   Then,
> one can provide his own unification map that also maps PUA
> chars to some Unicode chars as this:
>   (unify-charset 'big5 "MyBig5.map")
> After this, I thought that previously read PUA chars staying
> in the high-area should be treated as the corresponding
> Unicode chars (in displaying, search, etc).

But again, this unification takes place during decoding.  Whereas what
I'm talking about takes place when reading the internal utf-8
representation, which should be already unified.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]