bug#31837: 26.1; replace-buffer-contents doesn't work if buffer has mult

bug-gnu-emacs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#31837: 26.1; replace-buffer-contents doesn't work if buffer has mult

From:	Eli Zaretskii
Subject:	bug#31837: 26.1; replace-buffer-contents doesn't work if buffer has multibyte characters
Date:	Fri, 15 Jun 2018 11:34:08 +0300

> From: mstanojevic@janestreet.com (Milan Stanojević)
> Date: Thu, 14 Jun 2018 17:34:27 -0400
> 
> Here is a small recipe that illustrates the bug.
> 
> recipe.el
> ---------
> (setq use-multibyte (< 0 (length argv)))
> 
> (switch-to-buffer "file1")
> (when use-multibyte (insert-char (char-from-name "SMILE")))
> (insert "1234")
> 
> (switch-to-buffer "file2")
> (when use-multibyte (insert-char (char-from-name "SMILE")))
> (insert "5678")
> 
> (replace-buffer-contents "file1")
> 
> (princ (buffer-substring-no-properties (point-min) (point-max)))
> (princ "\n")
> -------------
> 
> Running the recipe as script
> 
> $ emacs -Q --script /tmp/recipe.el
> 1234
> 
> $ emacs -Q --script /tmp/recipe.el multibyte
> ⌣5234
> 
> In the first run, with just ascii characters, everything works as
>  expected. 
> 
> In the second run, with multibyte characters, the function didn't
>  replace '5' with '1' as expected.

Right, thanks for catching this blunder.

> I looked at the code and it looks to me like there is a very obvious bug
>  in function buffer_chars_equal in editfns.c. It calls
>  BUF_FETCH_CHAR_AS_MULTIBYTE passing *character* positions, but the
>  macro expects *byte* positions. (it would be nice if these char vs byte
>  positions could be distinguished with types, but I'm not sure it is
>  possible in C).
> 
> The simple fix is to replace BUF_FETCH_CHAR_AS_MULTIBYTE with
>  STRING_CHAR (BUF_CHAR_ADDRESS (buf, pos)) and this seems to work.

Thanks, I installed a slightly different fix on the emacs-26 branch,
perhaps you could try that.

> I'm not sure about performance of the above fix, though, because
>  accessing random character position in a buffer is not constant. If
>  diffing algorithm is accessing buffer positions in more or less
>  localized manner, maybe it makes sense to move the point inside
>  buffer_chars_equal so the char position to byte position conversion is
>  fast. It probably doesn't matter for small files.

The conversion of character positions to byte positions should be
quite fast.  It got even faster on the master branch, so I think we
are good.  If there are reports about slow operation of this function,
we could in the future try to optimize it more.

[Prev in Thread]

Current Thread

[Next in Thread]

bug#31837: 26.1; replace-buffer-contents doesn't work if buffer has multibyte characters, Milan Stanojević, 2018/06/14
- bug#31837: 26.1; replace-buffer-contents doesn't work if buffer has multibyte characters, Eli Zaretskii <=
  - bug#31837: 26.1; replace-buffer-contents doesn't work if buffer has multibyte characters, Milan Stanojević, 2018/06/18
    - bug#31837: 26.1; replace-buffer-contents doesn't work if buffer has multibyte characters, Eli Zaretskii, 2018/06/18
    - bug#31837: 26.1; replace-buffer-contents doesn't work if buffer has multibyte characters, Milan Stanojević, 2018/06/18
    - bug#31837: 26.1; replace-buffer-contents doesn't work if buffer has multibyte characters, Eli Zaretskii, 2018/06/18

Prev by Date: bug#31750: simplify and tune Emacs stack-related allocation
Next by Date: bug#31737: 26.1.50; Gnus does not use mailutils movemail
Previous by thread: bug#31837: 26.1; replace-buffer-contents doesn't work if buffer has multibyte characters
Next by thread: bug#31837: 26.1; replace-buffer-contents doesn't work if buffer has multibyte characters
Index(es):
- Date
- Thread