[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#15984: 24.3; Problem with combining characters in attachment filenam

From: Eli Zaretskii
Subject: bug#15984: 24.3; Problem with combining characters in attachment filename
Date: Fri, 29 Nov 2013 13:26:50 +0200

> From: address@hidden (Niels Möller)
> Cc: address@hidden
> Date: Fri, 29 Nov 2013 11:43:45 +0100
> Eli Zaretskii <address@hidden> writes:
> >> Good! I thought emacs used a simpler mapping character <-> a single
> >> unicode value.
> >
> > Maybe I misunderstood you: what's the difference between those two
> > alternatives?
> What I think is the right thing, is to allow a sequence of unicode
> values, e.g., "A" + combining character, or "A" + any random sequence of
> combining characters, intern this string, and treat this as a single
> "character".

That's not how Emacs represents and treats characters.  The
composition happens only at display time, and normalization, as it's
currently implemented, happens when text is read into a buffer.
Thereafter, each Unicode character is a single character, and there's
no combining of them for any purpose except display.

> The idea is that this character object should correspond to what the
> user thinks of as a single character. E.g, one glyph per character, and
> treated as a unit by forward-char, and regexp matching with "." and
> character sets.

What gets displayed as a single unit is a "grapheme cluster", not a
single glyph.  Whether a grapheme cluster that corresponds to "A" +
any random sequence of combining characters maps to a single glyph
depends on the font being used, which is something the user should not
need to worry about.  However, we do want to give the user a way to
delete only one or more of the combining characters, so forcing the
entire combination to be a single indivisible entity would not be TRT
for users.  Cursor motion does consider the entire thing as a single
entity and moves across all of it, but that requires special code.

IOW, things are not that simple, and I think the design you are
suggesting is problematic in that it will remove several important
features, or make them harder to implement.

> When reading text files, the character boundaries may be configurble.

The important question is what to do by default, as many users will
not be happy if asked too many questions or requested to specify too
many parameters for reading text.  Compare this with the need to
specify the encoding in too many cases in the early days of
multilingual Emacs -- there was a user outcry about that.

> E.g, there could be a mode which makes each and every unicode value a
> single character, which will then be displayed as separate glyphs,
> separate characters for regexp matching, etc.

You are mixing display issues with editing issues and with how
characters are represented internally in an Emacs buffer.  These all
are separate, and do not necessarily need to handle characters in the
same rigid way.

> Move away any gnus-related configuration files (~/.gnus, ~/.newsrc*).
> Create a spool-like directory, e.g, "~/tmp/mail". Copy the file to
> "~/tmp/mail/1". Start emacs -Q -nw -f gnus-no-server. In the *Group* buffer,
> press G d to create a directory group, enter ~/tmp/mail. You should now
> be able to enter that group, and select the message in the *Summary*
> buffer.
> To mimic my setup, do this in an xterm running in a latin-1 locale. (I
> have to send this off now, I'll try later to really see if this recipe
> reproduces the problem for me).

Thanks, I will try that.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]