bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#61726: [PATCH] Eglot: Support positionEncoding capability


From: Eli Zaretskii
Subject: bug#61726: [PATCH] Eglot: Support positionEncoding capability
Date: Thu, 23 Feb 2023 17:04:08 +0200

> From: Augusto Stoffel <arstoffel@gmail.com>
> Cc: 61726@debbugs.gnu.org,  joaotavora@gmail.com
> Date: Thu, 23 Feb 2023 14:31:52 +0100
> 
> On Thu, 23 Feb 2023 at 14:54, Eli Zaretskii wrote:
> 
> >> But just to confirm: position-bytes and byte-to-position are always with
> >> respect to Emacs's internal extended UTF-8 representation and have
> >> nothing to do with the buffer file enconding, right?
> >
> > Yes.  See bufferpos-to-filepos to get an idea of what hoops we need to
> > jump through to get it right, even just with UTF-8.
> 
> Okay, then we're on the same page.  Just to emphasize, the buffer file
> is totally irrelevant for Eglot's purposes.  The only thing that matters
> is the representation of the buffer text when it's serialized as an
> UTF-8-encoded string inside a JSON object.

The buffer's file is not important for the issue at hand, only its
encoding is.  And in the case of Eglot, the encoding is still there,
even though there's no file involved.  So the code in
bufferpos-to-filepos is still very relevant, as it shows what has to
be done for such conversions.

> >> `eglot-move-to-column' is supposed so count Unicode codepoints, so
> >> e.g. x, ⇒ and 😃 all contribute 1 unit.
> >
> > But if the resulting column is then used in move-to-column etc., it
> > might go to the wrong column, because in Emacs each column is not
> > necessarily a single codepoint.  The simplest example is a TAB
> > character, but there are more examples, some of which are quite
> > complicated (see below).
> 
> There's only one function that uses `move-to-column'.  It's very old and
> I didn't touch it.

Then why does Eglot want to know the column at all?

> >> One the other hand, the Emoji
> >> 🧛‍♀️ contributes 4 units. This is independent of with screen display.
> >
> > Not in Emacs.
> 
> Sorry, I don't understand what you mean.  Emas has no say as to how
> Emoji are represented as sequences of codepoints.  The female vampire
> Emoji is 4 codepoints, if I'm counting it right.

What I meant is that the number of columns a given sequence of
codepoints will take on display is not equal to the number of
codepoints in the sequence.  This is so for Emoji sequences as well.

> > If that is what you see, it could be a bug.  Does current-column agree
> > with what you see in the mode line?
> 
> Yes.

Then at least it's not a grave bug.  current-column and friends
doesn't support all the quirks of our display code which can change
how many columns some sequence of codepoints can take on display.  It
does support quite a few of them, though.

> If you look carefully at the Eglot code, you will see that
> `move-to-column' only appears in the code pertaining the “UTF-16 way of
> counting offsets”, which
> 
> 1. is old and I didn't touch in this patch,
> 2. seems to work correctly, despite looking suspicious, and
> 3. will not be used anymore when both Eglot and the LSP server supports
>    the positionEncodings capabitily.

Positions do not necessarily transform to columns easily.  So,
depending on how Eglot uses this information, we may or may not have
problems.

In general, reporting coordinates in columns between programs is
problematic.  We see this in many cases, starting with spellers.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]