[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#23595: 25.1.50; file with chinese/japanse chars, vc-diff fails (HG,

From: Eli Zaretskii
Subject: bug#23595: 25.1.50; file with chinese/japanse chars, vc-diff fails (HG, Git, RCS)
Date: Wed, 25 May 2016 19:22:14 +0300

> Cc: address@hidden, address@hidden, address@hidden
> From: Dmitry Gutov <address@hidden>
> Date: Wed, 25 May 2016 03:09:27 +0300
>     Not sure it's a good idea: the solution we found is only known to work
>     with Git, whereas vc-coding-system-for-diff is for any VCS.  Mercurial
>     seems to have a similar encode/decode filter feature, but I'm not sure
>     using it means the diff results will be in UTF-8.
> Do we actually know that we'll need this behavior to be VCS-specific?

I think we can make an educated guess, see below.  My conclusion is
that this does need to be VCS-specific.

> So far, we've seem some pretty similar results with vc-diff using Git, Hg and 
> RCS.

Not quite.  They all fail, but in different ways.  More importantly,
the solutions are most probably going to be different.

>     I think we should have a git-specific function that implements the
>     above idea, and then we should use it in vc-coding-system-for-diff.
> Git-specific or backend-specific?

Backend-specific, most probably.  Except that currently we only have a
good idea about the Git backend, for which it is explicitly documented
that the output will be in UTF-8 when content filters are used.

Mercurial and Bazaar both support similar filters, but I cannot find
any documentation on what encoding will be used for the output.  For
Bazaar, there's a general statement somewhere that it defaults to the
locale's encoding (there's a config variable to change that).

SVN doesn't seem to support filters at all, so with it, the user will
have to manually set the mime-type property of the UTF-16 files as
text, and install a replacement Diff command that can produce diffs
from UTF-16 files (I believe GNU Diff cannot currently do that).
Since no canonical way exists, I don't see how we can know for sure
the encoding of the Diff output; my best guess is that it will also be
in UTF-16.  (Similar problems exist in SVN with other operations on
such files.)

For RCS and CVS, I don't see any solution at all, since AFAIK these
don't support any such features or anything similar.  These will
always treat UTF-16 files as binary, so no meaningful diffs can be
produced for them.

> I suppose we could add some new encoding-handling logic at the beginning of 
> vc-git-diff instead.
>     (I prefer a separate function because my gut feeling is that we will
>     need something like that in other Git operations, when UTF-16 files
>     are involved.)
> We can always extract a new function when it's needed, though.

True, but I think if we want to support UTF-16 files, the need is
already here.  vc-diff and its derivatives are just the tip of the
iceberg, we will need similar stuff for every command that includes
both text from the versioned file(s) and some text output by the VCS
program itself.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]