[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#23595: 25.1.50; file with chinese/japanse chars, vc-diff fails (HG,

From: Eli Zaretskii
Subject: bug#23595: 25.1.50; file with chinese/japanse chars, vc-diff fails (HG, Git, RCS)
Date: Mon, 23 May 2016 19:48:50 +0300

> From: Dmitry Gutov <address@hidden>
> Date: Mon, 23 May 2016 14:52:03 +0300
> > The resulting diff contains either rubbish or fails to run.
> > Files attached.

I don't see any rubbish in the Git output.  With RCS, the command
signals an error, so more digging is needed to find out what's wrong
(although it could be that rcsdiff exits with non-zero status when it
sees what looks like binary files).

> It seems, to an extent, be caused by our setting coding-system-for-read 
> inside vc-diff-internal (to utf-16be-with-signature-unix, which is also the 
> value of buffer-file-coding-system).
> Without that, the result of vc-diff (at least with Git) is "Binary files 
> a/test-chin-jap.tex and b/test-chin-jap.tex differ". Emacs 24.5 does the same.

Setting coding-system-for-read is correct, because the important use
case is when the diffs are actually output.  The problem is that
UTF-16 is not ASCII-compatible, and so text output by Git itself will
be mishandled.  Another problem is that Git doesn't show the diffs at

> Which is weird, considering both vc-diff-internal and 
> vc-coding-system-for-diff have both been virtually untouched for the last 
> couple of years.

Not sure what do you see as weird.

> But even if we figure out why happens, you (Uwe) probably want Git, Hg, etc, 
> to treat this file as text, and not binary. Only then you'll be able to get 
> meaningful diffs. I don't have a specific advice on that.

Why can't we invoke "git diff --text"?  That should fix the second
problem, I think.

As for the first problem, we should probably refrain from binding
coding-system-for-read to a CODING-SYSTEM for which

   (coding-system-get CODING-SYSTEM :ascii-compatible-p)

returns nil.  We should instead bind it to no-conversion and decode
the file data parts by hand, skipping the parts that Git itself
outputs (yes, this is messy).  Patches to that effect are welcome.

Bottom line: users who put UTF-16 encoded files into VCS are playing
with fire, and are best advised not to do that!

reply via email to

[Prev in Thread] Current Thread [Next in Thread]