bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#28339: 25.2; Emacs shows ZWNJ character (Zero Width non-Joiner) as S


From: Eli Zaretskii
Subject: bug#28339: 25.2; Emacs shows ZWNJ character (Zero Width non-Joiner) as Space
Date: Sat, 16 Sep 2017 10:24:06 +0300

> From: handa <address@hidden>
> Cc: address@hidden, address@hidden
> Date: Sat, 16 Sep 2017 10:32:57 +0900
> 
> In article <address@hidden>, Eli Zaretskii <address@hidden> writes:
> 
> > > Each Arabic character constitutes a grapheme cluster.  Then, for the
> > > sequence "0646 0645 06CC 200C 0634 0648 062F", to which neighboring should
> > > 200C belongs to?  Does Unicode define it?
> 
> > I don't think Unicode defines that, but I thought the shaping engine
> > gives us back glyphs that don't include ZWNJ itself.  Evidently,
> > that's not true, which I find strange.
> 
> If ZWNJ is WITHIN a grapheme cluster (i.e. not at the edges
> of the cluster), the m17n lib does not return ZWNJ glyph.
> 
> > > Anyway, is it convenient or inconvenient to be able to edit ZWNJ directly?
> 
> > It's convenient.  But we already support deletion of composed
> > characters, so I didn't think it mattered.
> 
> If Unicode does not have a rule of ZWNJ handing, to delete ZWNJ, how a
> user know which to type; C-d or BS?

Above, you asked about Unicode definition as to which grapheme cluster
should ZWNJ belong.  On that, I said I didn't think there's any
Unicode ruling (although to be sure, we should probably ask a question
on the Unicode mailing list).

But here, you are talking about deleting a ZWNJ from display, and
there Unicode does have a clear rule, see Section 23.2 there.  A
pertinent quote (Implementation Notes, p.849):

  As with all other alternate format characters, fonts should use an
  invisible zero-width glyph for representation of both ZWJ and ZWNJ.

This seems to be a requirement for fonts, but it does convey what
Unicode thinks about displaying ZWNJ.

Emacs generally tries to display such control characters, because
hiding them from users is un-Emacsy.  But in this case, it seems like
users expect us to hide it.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]