[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#23814: 24.5; bug of hz coding-system
From: |
ynyaaa |
Subject: |
bug#23814: 24.5; bug of hz coding-system |
Date: |
Fri, 29 Jul 2016 10:05:14 +0900 |
handa <handa@gnu.org> writes:
> In article <87twffigzv.fsf@gmail.com>, ynyaaa@gmail.com writes:
>
>> But I found other bugs about decodings of "~" escape.
>> "~~" and "~{!!~}" should be encoded and decoded as below.
>> "~~" -> "~~~~" -> "~~"
>> "~{!!~}" -> "~~{!!~~}" -> "~{!!~}"
>
>> In really they are encoded properly, but decoded in wrong way.
>> (decode-coding-string (encode-coding-string "~~" 'hz) 'hz)
>>>> "~"
>> (decode-coding-string (encode-coding-string "~{!!~}" 'hz) 'hz)
>>>> #("\x3000" 0 1 (charset chinese-gb2312))
>
> Thank you for finding those bugs. Could you please try the attached
> patch instead?
>
> ---
> K. Handa
> handa@gnu.org
If there are unencodable characters, encodable characters may be broken.
In this example, the second ?\x4E00 character disappears.
(set-language-environment 'Chinese-GB)
(decode-coding-string (encode-coding-string "\x4E00\x00B7\x4E00" 'hz) 'hz)
=> "\x4E00\e\x3048\x6070\x70B3\x11213D\300\273"
To avoid this behavior, there are some solutions.
(a) While decoding, replace "~{...~}" with "\e$A...\e(B"
and decode with iso-2022-7bit.
(b) Like (a), replace "~{...~}" with "\e$A...\e(B" while decoding
and insert "\e$)A" at the beginning of the temp buffer
and decode with iso-2022-8bit-ss2.
(8bit data are decoded as euc-cn.)
(c) While encoding, use euc-cn instead of iso-2022-7bit
and translate each consecutive 8bit data to 7bit data
prefixed by "~{" and postfixed by "~}".
By the way, RFC1843 describes:
The escape sequence '~\n' is a line-continuation marker to be
consumed with no output produced.
This form shoud return "AB".
(decode-coding-string "A~\nB" 'hz)
=> "A\nB"
> diff --git a/lisp/language/china-util.el b/lisp/language/china-util.el
> index e531640..9abdae1 100644
> --- a/lisp/language/china-util.el
> +++ b/lisp/language/china-util.el
> @@ -95,7 +95,12 @@ decode-hz-region
> (goto-char (point-min))
> (while (search-forward "~" nil t)
> (setq ch (following-char))
> - (if (or (= ch ?\n) (= ch ?~)) (delete-char -1)))
> + (if (= ch ?{)
> + (search-forward "~}" nil 'move)
> + (when (or (= ch ?\n) (= ch ?~))
> + (delete-char -1)
> + (put-text-property (point) (1+ (point)) 'hz-decoded t)
> + (forward-char 1))))
>
> ;; "^zW...\n" -> Chinese GB2312
> ;; "~{...~}" -> Chinese GB2312
> @@ -104,6 +109,8 @@ decode-hz-region
> (while (re-search-forward hz/zw-start-gb nil t)
> (setq pos (match-beginning 0)
> ch (char-after pos))
> + (if (and (= ch ?~) (get-text-property pos 'hz-decoded))
> + (forward-char 1)
> ;; Record the first position to start conversion.
> (or beg (setq beg pos))
> (end-of-line)
> @@ -122,9 +129,10 @@ decode-hz-region
> t)
> (delete-char -2))
> (setq end (point))
> - (translate-region pos (point) hz-set-msb-table))))
> + (translate-region pos (point) hz-set-msb-table)))))
> (if beg
> (decode-coding-region beg end 'euc-china)))
> + (remove-text-properties (point-min) (point-max) '(hz-decoded nil))
> (- (point-max) (point-min)))))
>
> ;;;###autoload
> @@ -142,6 +150,7 @@ encode-hz-region
> (save-restriction
> (narrow-to-region beg end)
>
> + (put-text-property beg end 'charset 'chinese-gb2312)
> ;; "~" -> "~~"
> (goto-char (point-min))
> (while (search-forward "~" nil t) (insert ?~))