[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Japanese '者' (U+8005) is replaced with \350\200

From: OKUMURA, Akira
Subject: Re: Japanese '者' (U+8005) is replaced with \350\200
Date: Fri, 8 Jan 2021 15:08:59 +0900

Dear Karl,

Thank you. I am attaching the output result.

$ wdiff input1.txt input2.txt
16848   1月2日    40代男性           豊橋市     [-陽性者と接触-]      {+知人が陽性?+}?     豊橋市発表445

Here is the output copied and pasted from my terminal. The \350\200 bytes, 
which are seen in Emacs, corresponds to the characters in "?+}?" above.

I am sure that it is not an Emacs issue.

$ wdiff input1.txt input2.txt > wdiff.txt
$ python3
Python 3.8.5 (default, Jul 21 2020, 10:48:26) 
[Clang 11.0.3 (clang-1103.0.32.62)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> f = open('wdiff.txt')
>>> line = f.readlines()[0]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
 line 322, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode bytes in position 78-79: invalid 
continuation byte$ ip

$ wdiff --version
wdiff (GNU wdiff) 1.2.2

OKUMURA, Akira oxon@mac.com / oxon@nagoya-u.jp
⌘ Junior Associate Professor at
- Institute for Space–Earth Environmental Research (ISEE)
- Kobayashi–Maskawa Institute for the Origin of Particles and the Universe (KMI)
Nagoya University, Furo-cho, Chikusa-ku, Nagoya 464-8601, Japan
Office/Lab/Fax: +81 (0)52-789-4320/4324/4313

Attachment: wdiff.txt
Description: Text document

> On Jan 8, 2021, at 7:27, Karl Berry <karl@freefriends.org> wrote:
>    generates a result with a broken word, in which a Japanese
>    character, '者', (Unicode U+8005) is replaced with
>    \350\200 when opening the result in Emacs.
> Sorry, this probably isn't very helpful, but ...  are you sure it's not
> an Emacs issue? As far as I can tell, wdiff is just outputting the bytes
> it sees.
> Running wdiff | od -c on your input files, I see the three bytes (in
> octal) 0250 0200 0205 in order in the output. I'm using LC_ALL=C to avoid
> locale interpretations getting in the way. --best, karl.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]