bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#23086: 25.1.50; Emacs ignores Unicode line and paragraph separator c


From: Eli Zaretskii
Subject: bug#23086: 25.1.50; Emacs ignores Unicode line and paragraph separator characters
Date: Tue, 22 Mar 2016 18:13:15 +0200

> From: Philipp Stephani <p.stephani2@gmail.com>
> Date: Tue, 22 Mar 2016 11:42:46 +0100
> 
> Type some characters
> C-x 8 RET LINE SEPARATOR (or PARAGRAPH SEPARATOR)
> Type some more characters
> M-q
> 
> Expected behavior: Emacs treats these characters as line and paragraph
> separators: they are displayed as line breaks, M-q doesn't remove them,
> and forward-paragraph etc. treat the paragraph separator as paragraph
> end.
> 
> Actual behavior: These characters are displayed as one-pixel horizontal
> whitespace and otherwise ignore.
> 
> Also discussed in
> https://lists.gnu.org/archive/html/emacs-devel/2015-08/msg01043.html.
> https://www.emacswiki.org/emacs/unicode-whitespace.el supposedly adds
> support for these characters, but I think proper treatment of Unicode
> separators should be part of Emacs.

It is not clear to me what exactly is the requested feature.  Can you
propose a detailed list of requirements?

I'm asking because these characters come in Unicode with a non-trivial
baggage, that is a far cry from just breaking the line; see

  http://unicode.org/reports/tr14/
  http://unicode.org/reports/tr29/

There are also implications on the bidirectional display (it is
sensitive to where the line and the paragraph begin and end).

If we want to support these two characters, we should think about
which parts of the relevant functionality we want to see in Emacs,
because users will expect that.  In addition, there are other
white-space characters defined by Unicode, and it would make sense to
treat them all alike.  I'm not sure it makes sense to support just the
line-breaking and paragraph-separator parts of only these two
characters.

Then there are Emacs-specific issues, for example:

 . do we treat u+2028 and u+2029 as literal characters, or as a form
   of EOL encoding?
 . if the former, how do we distinguish them from newlines on display?
 . should Isearch find these when looking for "\n"? how about regexp
   search for "$"?

There are probably more implications, these just the ones that popped
in my mind in 5 sec.  IOW, I think Someoneā„¢ should think this over and
present a detailed proposal.

Thanks.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]