[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#23595: 25.1.50; file with chinese/japanse chars, vc-diff fails (HG,

From: Uwe Brauer
Subject: bug#23595: 25.1.50; file with chinese/japanse chars, vc-diff fails (HG, Git, RCS)
Date: Mon, 23 May 2016 17:00:53 +0000
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/25.1.50 (gnu/linux)

>>> "Eli" == Eli Zaretskii <address@hidden> writes:

   >> From: Dmitry Gutov <address@hidden>
   >> Date: Mon, 23 May 2016 14:52:03 +0300
   >> > The resulting diff contains either rubbish or fails to run.
   >> > Files attached.

   > I don't see any rubbish in the Git output.  With RCS, the command
   > signals an error, so more digging is needed to find out what's wrong
   > (although it could be that rcsdiff exits with non-zero status when it
   > sees what looks like binary files).

   >> It seems, to an extent, be caused by our setting
   >> coding-system-for-read inside vc-diff-internal (to
   >> utf-16be-with-signature-unix, which is also the value of
   >> buffer-file-coding-system).
   >> Without that, the result of vc-diff (at least with Git) is "Binary
   >> files a/test-chin-jap.tex and b/test-chin-jap.tex differ". Emacs
   >> 24.5 does the same.

   > Setting coding-system-for-read is correct, because the important use
   > case is when the diffs are actually output.  The problem is that
   > UTF-16 is not ASCII-compatible, and so text output by Git itself will
   > be mishandled.  Another problem is that Git doesn't show the diffs at
   > all.

   >> Which is weird, considering both vc-diff-internal and
   >> vc-coding-system-for-diff have both been virtually untouched for the
   >> last couple of years.

   > Not sure what do you see as weird.

   >> But even if we figure out why happens, you (Uwe) probably want Git,
   >> Hg, etc, to treat this file as text, and not binary. Only then
   >> you'll be able to get meaningful diffs. I don't have a specific
   >> advice on that.

   > Why can't we invoke "git diff --text"?  That should fix the second
   > problem, I think.

I thought the problem was caused by the fact that I did not entered that
chars, but rather copied it from some tex.stackexchange site, but I see
that was not the reason.

What is about mercurial?[1]

   > As for the first problem, we should probably refrain from binding
   > coding-system-for-read to a CODING-SYSTEM for which

   >    (coding-system-get CODING-SYSTEM :ascii-compatible-p)

   > returns nil.  We should instead bind it to no-conversion and decode
   > the file data parts by hand, skipping the parts that Git itself
   > outputs (yes, this is messy).  Patches to that effect are welcome.

   > Bottom line: users who put UTF-16 encoded files into VCS are playing
   > with fire, and are best advised not to do that!

Right, I see, that was just 2 chars in a document which contained
latin-1 or UTF8. So Chinese and Japanese programmers are in a
disadvantage, no?

[1]   I don't care so much about RCS in that context.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]