[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#15984: 24.3; Problem with combining characters in attachment filenam
From: |
Niels Möller |
Subject: |
bug#15984: 24.3; Problem with combining characters in attachment filename |
Date: |
Sat, 30 Nov 2013 09:53:48 +0100 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/24.3 (usg-unix-v) |
Stefan Monnier <monnier@iro.umontreal.ca> writes:
>> What I think is the right thing, is to allow a sequence of unicode
>> values, e.g., "A" + combining character, or "A" + any random sequence of
>> combining characters, intern this string, and treat this as a single
>> "character".
>
> For the Lisp-level notion of "character", I think this would require too
> many deep changes.
I can understand that. I'm actually impressed by the move from MULE
encodings to unicode, which to a user appeared to very smooth.
But I still think that type of "character" abstraction the right thing
for unicode text processing in general.
> For forward-char, we do try to fake that behavior (e.g. a `forward-char'
> command will skip over the whole A+ring combo) but not faithfully
> (e.g. `C-u 2 forward-char' will also just skip that combo, and not the
> subsequent char). It's not perfect, but it seems "close enough" that it
> hasn't proved problematic.
Didn't know, that's a bit weird. I just tried, as Eli suggested, editing
text with "ä" represented with a as a combining character. In
emacs-23.4, pressing DEL after the "ä" deletes the dots only. I now
understand why, but it's not what I had expected, and I think deleteing
the entire A + dots would be preferable. Plain C-x = on the "a" shows
just "Char: a (97, #o141, #x61) point=443 of 455 (97%) column=1", but
C-u C-x = also shows the combining char.
However, emacs-24.3 behaves differently, the 'a' and the '"' gets
displayed differently, and are not combined at all for display.
The buffer shows 'a"', and according to C-u C-x 8 the '"' is a
"COMBINING DIAERESIS". These tests done in an X11 frame, so maybe
they're just picking up different fonts?
>> E.g, there could be a mode which makes each and every unicode value a
>> single character, which will then be displayed as separate glyphs,
>> separate characters for regexp matching, etc.
>
> I think we wouldn't want to use different modes (too coarse) but
> different commands instead.
I didn't mean an emacs major or minor mode. It would be more like a
special coding system, applied when reading the text from file.
> In any case, a first step would be to find a name for that notion of "multi
> character character". "Grapheme cluster" doesn't sound too good if we
> want to expose the concept to the end user.
I think "character" is the right word, the main source of confusion is
that unicode code points are often referred to as "characters".
Regards,
/Niels
--
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.
- bug#15984: 24.3; Problem with combining characters in attachment filename, (continued)
- bug#15984: 24.3; Problem with combining characters in attachment filename, Eli Zaretskii, 2013/11/29
- bug#15984: 24.3; Problem with combining characters in attachment filename, Niels Möller, 2013/11/29
- bug#15984: 24.3; Problem with combining characters in attachment filename, Eli Zaretskii, 2013/11/29
- bug#15984: 24.3; Problem with combining characters in attachment filename, Eli Zaretskii, 2013/11/29
- bug#15984: 24.3; Problem with combining characters in attachment filename, Eli Zaretskii, 2013/11/30
- bug#15984: 24.3; Problem with combining characters in attachment filename, Kenichi Handa, 2013/11/30
- bug#15984: 24.3; Problem with combining characters in attachment filename, Eli Zaretskii, 2013/11/30
- bug#15984: 24.3; Problem with combining characters in attachment filename, Niels Möller, 2013/11/30
- bug#15984: 24.3; Problem with combining characters in attachment filename, Stefan Monnier, 2013/11/29
- bug#15984: 24.3; Problem with combining characters in attachment filename, Eli Zaretskii, 2013/11/29
- bug#15984: 24.3; Problem with combining characters in attachment filename,
Niels Möller <=
bug#15984: 24.3; Problem with combining characters in attachment filename, Kenichi Handa, 2013/11/29