[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#15984: 24.3; Problem with combining characters in attachment filenam

From: Niels Möller
Subject: bug#15984: 24.3; Problem with combining characters in attachment filename
Date: Fri, 29 Nov 2013 13:41:01 +0100
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/24.3 (usg-unix-v)

Eli Zaretskii <address@hidden> writes:

> However, we do want to give the user a way to
> delete only one or more of the combining characters, so forcing the
> entire combination to be a single indivisible entity would not be TRT
> for users.

Good question, how to handle this.

Today, to remove the dots from an "ä" character, I'll have to delete the
complete "ä" character and insert a new "a" character. Or similarly for
the reverse edit. I think this "atomic" handling is the desired
behaviour in many cases. And I don't think it should behave differently
depending on the representation of "ä" in the original file. But if you
have a complex sequence of unicode combining characters, I agree there's
some need to be able to edit it. Maybe put point on the character and
invoke edit-char to go in some special mode which explodes the usually
"atomic" character into smaller pieces.

And such a character edit mode might be useful for more things than
unicode composing characters, e.g, manipulationg the different sub-parts
of a chinese character. Anyway, this user interface is not intimately
tied to the internal character representation; its overall effect on the
buffer will be the same as replacing any substring.

>> When reading text files, the character boundaries may be configurble.
> The important question is what to do by default,

I'm pretty sure the default should be that a sequence of one unicode
base char and all following unicode combining chars is interned as a
single "emacs character". (I think the detailed rules for this are
spelled out in the unicode book). With some arbitrary limit to prevent a
GByte file with only unicode combining characters to get read as a
single emacs character; say at most 10 combining characters.

> You are mixing display issues with editing issues and with how
> characters are represented internally in an Emacs buffer.

I think it's confusing for users if the units of text which forward-char
skips over, do not correspond to the units matched by "." in

My suggested internal representation seems to be a natural way to get
this correspondence right, at the cost of some memory (or lots of
complexity in reducing memory usage). I'm sure there are other ways, and
maybe also a lot better ways, to implement the same thing.

> Thanks, I will try that.

Now I've also reproduced it on the same machine, without my normal Gnus
setup getting in the way. I start emacs with

  $ rm -rf ~/tmp/home/ && mkdir ~/tmp/home/ && HOME=$HOME/tmp/home emacs -nw -Q 
-l bug.el

where bug.el contains

  (setq gnus-init-file nil)
  (setq gnus-nntp-server nil)

Then create the group with G d, pointing out the spool-like directory,
enter the group (RET), view the message (RET), try to write out the
attachment ("o" on the attachment button). Still crashes for me.


Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]