[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#31837: 26.1; replace-buffer-contents doesn't work if buffer has mult
From: |
Eli Zaretskii |
Subject: |
bug#31837: 26.1; replace-buffer-contents doesn't work if buffer has multibyte characters |
Date: |
Fri, 15 Jun 2018 11:34:08 +0300 |
> From: mstanojevic@janestreet.com (Milan Stanojević)
> Date: Thu, 14 Jun 2018 17:34:27 -0400
>
> Here is a small recipe that illustrates the bug.
>
> recipe.el
> ---------
> (setq use-multibyte (< 0 (length argv)))
>
> (switch-to-buffer "file1")
> (when use-multibyte (insert-char (char-from-name "SMILE")))
> (insert "1234")
>
> (switch-to-buffer "file2")
> (when use-multibyte (insert-char (char-from-name "SMILE")))
> (insert "5678")
>
> (replace-buffer-contents "file1")
>
> (princ (buffer-substring-no-properties (point-min) (point-max)))
> (princ "\n")
> -------------
>
> Running the recipe as script
>
> $ emacs -Q --script /tmp/recipe.el
> 1234
>
> $ emacs -Q --script /tmp/recipe.el multibyte
> ⌣5234
>
> In the first run, with just ascii characters, everything works as
> expected.
>
> In the second run, with multibyte characters, the function didn't
> replace '5' with '1' as expected.
Right, thanks for catching this blunder.
> I looked at the code and it looks to me like there is a very obvious bug
> in function buffer_chars_equal in editfns.c. It calls
> BUF_FETCH_CHAR_AS_MULTIBYTE passing *character* positions, but the
> macro expects *byte* positions. (it would be nice if these char vs byte
> positions could be distinguished with types, but I'm not sure it is
> possible in C).
>
> The simple fix is to replace BUF_FETCH_CHAR_AS_MULTIBYTE with
> STRING_CHAR (BUF_CHAR_ADDRESS (buf, pos)) and this seems to work.
Thanks, I installed a slightly different fix on the emacs-26 branch,
perhaps you could try that.
> I'm not sure about performance of the above fix, though, because
> accessing random character position in a buffer is not constant. If
> diffing algorithm is accessing buffer positions in more or less
> localized manner, maybe it makes sense to move the point inside
> buffer_chars_equal so the char position to byte position conversion is
> fast. It probably doesn't matter for small files.
The conversion of character positions to byte positions should be
quite fast. It got even faster on the master branch, so I think we
are good. If there are reports about slow operation of this function,
we could in the future try to optimize it more.