bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#61726: [PATCH] Eglot: Support positionEncoding capability


From: Augusto Stoffel
Subject: bug#61726: [PATCH] Eglot: Support positionEncoding capability
Date: Fri, 24 Feb 2023 10:15:48 +0100
User-agent: Gnus/5.13 (Gnus v5.13)

On Fri, 24 Feb 2023 at 10:38, Eli Zaretskii wrote:

>> From: Augusto Stoffel <arstoffel@gmail.com>
>> Cc: joaotavora@gmail.com,  61726@debbugs.gnu.org
>> Date: Fri, 24 Feb 2023 08:18:30 +0100
>> 
>> On Fri, 24 Feb 2023 at 08:43, Eli Zaretskii wrote:
>> 
>> > It does? then please humor me by walking me through the code and the
>> > patch to show how that would work after applying the patch.
>> 
>> +            :general
>> +            (list
>> +             :positionEncodings ["utf-32" "utf-8" "utf-16"])
>>              :experimental eglot--{})))
>
> Is "UTF-32" an LSP thing and terminology?  Because I'd prefer a
> different name if we can.  At least for our internal nomenclature,
> let's use "codepoint" or "character" instead.

Yes, this is how the LSP spec refers to the 3 offset counting methods.

>> -(defun eglot-current-column () (- (point) (line-beginning-position)))
>> +(defun eglot-current-column ()
>> +  "Calculate current column, counting Unicode codepoints."
>> +  (- (point) (line-beginning-position)))
>
> Can we please take this opportunity to get rid of the confusing
> "column" terminology?  As became evident from this discussion, we are
> not talking columns here, we are talking offsets in characters from
> BOL.  So something like "pos" or "linepos" or "line-offset" should be
> better.
>
> João, are you okay with such a sweeping change in all of eglot.el?

I like linepos, if João is fine with not making the absolute minimal
amount of changes to the code.

>> +(defun eglot--current-column-utf-8 ()
>> +  "Calculate current column, counting bytes."
>> +  (- (position-bytes (point)) (position-bytes (line-beginning-position))))
>
> As discussed, position-bytes is incorrect.  You should instead do
> something like
>
>   (length (encode-coding-string
>            (buffer-substring-no-properties (point)
>                                            (line-beginning-position))
>            'utf-8-unix t))

But it is incorrect only if the buffer contains characters outside of
the Unicode range, right?  If that happens, we already lost, because a
few steps later we will serialize the buffer text as JSON to send it to
the server:

    (progn
     (insert ?x (max-char) ?y)
     (json-serialize (buffer-substring-no-properties (pos-bol)
     (pos-eol))))

    ⇒ Debugger entered--Lisp error: (wrong-type-argument utf-8-string-p " 
(json-serialize (buffer-substring-no-properties (...")

> Also, for 100% reliable results, we should bind
> inhibit-field-text-motion to t when calling line-beginning-position.

We should rather be using pos-bol, no?  But how do we keep compatibility
with older Emacsen?

>> +(defun eglot--move-to-column-utf-8 (column)
>> +  "Move to COLUMN, regarded as a byte offset."
>> +  (goto-char (min (byte-to-position
>> +                   (+ (position-bytes (line-beginning-position)) column))
>> +                  (line-end-position))))
>
> Likewise here.
>
>> @@ -1515,14 +1536,20 @@ eglot--lsp-position-to-point
>>        (forward-line (min most-positive-fixnum
>>                           (plist-get pos-plist :line)))
>>        (unless (eobp) ;; if line was excessive leave point at eob
>> -        (let ((tab-width 1)
>> +        (let ((movefn (or eglot-move-to-column-function
>> +                          (pcase (plist-get (eglot--capabilities 
>> (eglot-current-server))
>> +                                            :positionEncoding)
>> +                            ("utf-32" #'eglot-move-to-column)
>> +                            ("utf-8" #'eglot--move-to-column-utf-8)
>> +                            (_ #'eglot-move-to-lsp-abiding-column))))
>> +              (tab-width 1)
>                   ^^^^^^^^^^^
> This last part shouldn't be necessary: we should move by characters,
> not by columns.  Why is it necessary?

Maybe João can clarify, but I'm pretty sure this is there to support the
UTF-16 way of counting offsets, so this ideally should move to
eglot-move-to-lsp-abiding-column.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]