bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#15984: 24.3; Problem with combining characters in attachment filenam


From: Niels Möller
Subject: bug#15984: 24.3; Problem with combining characters in attachment filename
Date: Sat, 30 Nov 2013 09:53:48 +0100
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/24.3 (usg-unix-v)

Stefan Monnier <monnier@iro.umontreal.ca> writes:

>> What I think is the right thing, is to allow a sequence of unicode
>> values, e.g., "A" + combining character, or "A" + any random sequence of
>> combining characters, intern this string, and treat this as a single
>> "character".
>
> For the Lisp-level notion of "character", I think this would require too
> many deep changes.

I can understand that. I'm actually impressed by the move from MULE
encodings to unicode, which to a user appeared to very smooth.

But I still think that type of "character" abstraction the right thing
for unicode text processing in general.

> For forward-char, we do try to fake that behavior (e.g. a `forward-char'
> command will skip over the whole A+ring combo) but not faithfully
> (e.g. `C-u 2 forward-char' will also just skip that combo, and not the
> subsequent char).  It's not perfect, but it seems "close enough" that it
> hasn't proved problematic.

Didn't know, that's a bit weird. I just tried, as Eli suggested, editing
text with "ä" represented with a as a combining character. In
emacs-23.4, pressing DEL after the "ä" deletes the dots only. I now
understand why, but it's not what I had expected, and I think deleteing
the entire A + dots would be preferable. Plain C-x = on the "a" shows
just "Char: a (97, #o141, #x61) point=443 of 455 (97%) column=1", but
C-u C-x = also shows the combining char.

However, emacs-24.3 behaves differently, the 'a' and the '"' gets
displayed differently, and are not combined at all for display.
The buffer shows 'a"', and according to C-u C-x 8 the '"' is a
"COMBINING DIAERESIS". These tests done in an X11 frame, so maybe
they're just picking up different fonts?

>> E.g, there could be a mode which makes each and every unicode value a
>> single character, which will then be displayed as separate glyphs,
>> separate characters for regexp matching, etc.
>
> I think we wouldn't want to use different modes (too coarse) but
> different commands instead.

I didn't mean an emacs major or minor mode. It would be more like a
special coding system, applied when reading the text from file.

> In any case, a first step would be to find a name for that notion of "multi
> character character".  "Grapheme cluster" doesn't sound too good if we
> want to expose the concept to the end user.

I think "character" is the right word, the main source of confusion is
that unicode code points are often referred to as "characters".

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]