bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#61726: [PATCH] Eglot: Support positionEncoding capability


From: Eli Zaretskii
Subject: bug#61726: [PATCH] Eglot: Support positionEncoding capability
Date: Thu, 23 Feb 2023 12:39:09 +0200

> Cc: João Távora <joaotavora@gmail.com>
> From: Augusto Stoffel <arstoffel@gmail.com>
> Date: Thu, 23 Feb 2023 09:05:35 +0100
> 
> There is a new LSP capability allowing the server and client to agree on
> a way to count character offsets.  What do you think fo the attached
> patch?
> 
> It expresses Eglot's preferences as counting character offsets, then
> byte offsets, then the UTF-16 nonsense, in that order.
> 
> I would also suggest preparing the stage to eventually make
> `eglot-current-column-function' and `eglot-move-to-column-function'
> obsolete.  For that, I suggest renaming
> 
> - eglot-current-column -> eglot--current-column-utf-32
> - eglot-lsp-abiding-column -> eglot--current-columns-utf-16
> - eglot-move-to-column -> eglot--move-to-columns-utf-32
> - eglot-move-to-lsp-abiding-column -> eglot--move-to-columns-utf-16
> 
> and then making the old names obsolete aliases of the new names.

Please tell more about this, as I don't think I have a clear enough
idea of the issues and the implications for Emacs.

> +(defun eglot--current-column-utf-8 ()
> +  "Calculate current column, counting bytes."
> +  (- (position-bytes (point)) (position-bytes (line-beginning-position))))

This is subtly incorrect: position-bytes doesn't cound UTF-8 bytes, it
counts the bytes in the internal representation Emacs uses for buffer
and string text.  The differences are minor and subtle, but not
negligible.

>  (defun eglot-move-to-column (column)
> -  "Move to COLUMN without closely following the LSP spec."
> +  "Move to COLUMN, counting Unicode codepoints."
>    ;; We cannot use `move-to-column' here, because it moves to *visual*
>    ;; columns, which can be different from LSP columns in case of
>    ;; `whitespace-mode', `prettify-symbols-mode', etc.  (github#296,
> @@ -1490,8 +1505,14 @@ eglot-move-to-column
>    (goto-char (min (+ (line-beginning-position) column)
>                    (line-end-position))))
>  
> +(defun eglot--move-to-column-utf-8 (column)
> +  "Move to COLUMN, regarded as a byte offset."
> +  (goto-char (min (byte-to-position
> +                   (+ (position-bytes (line-beginning-position)) column))
> +                  (line-end-position))))
> +
>  (defun eglot-move-to-lsp-abiding-column (column)
> -  "Move to COLUMN abiding by the LSP spec."
> +  "Move to COLUMN, counting UTF-16 code units as in the original LSP spec."
>    (save-restriction
>      (cl-loop
>       with lbp = (line-beginning-position)
> @@ -1515,14 +1536,20 @@ eglot--lsp-position-to-point
>        (forward-line (min most-positive-fixnum
>                           (plist-get pos-plist :line)))
>        (unless (eobp) ;; if line was excessive leave point at eob
> -        (let ((tab-width 1)
> +        (let ((movefn (or eglot-move-to-column-function
> +                          (pcase (plist-get (eglot--capabilities 
> (eglot-current-server))
> +                                            :positionEncoding)
> +                            ("utf-32" #'eglot-move-to-column)
> +                            ("utf-8" #'eglot--move-to-column-utf-8)
> +                            (_ #'eglot-move-to-lsp-abiding-column))))
> +              (tab-width 1)
>                (col (plist-get pos-plist :character)))
>            (unless (wholenump col)
>              (eglot--warn
>               "Caution: LSP server sent invalid character position %s. Using 
> 0 instead."
>               col)
>              (setq col 0))
> -          (funcall eglot-move-to-column-function col)))
> +          (funcall movefn col)))
>        (if marker (copy-marker (point-marker)) (point)))))

What does this stuff do with double-width or zero-width characters?
Emacs takes character-width into consideration when it counts columns,
but it is unclear to me what do LSP servers do in those cases.
Likewise with characters that are composed on display.

So I think this mess needs to be carefully and elaborately discussed
before we decide how to implement it correctly.

Thanks.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]