bug-diffutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[bug-diffutils] Bug#704182: diffutils: Diff -r will confusion between as


From: Santiago Vila
Subject: [bug-diffutils] Bug#704182: diffutils: Diff -r will confusion between asian characters in filenames, when locale are non asian - UTF-8. (fwd)
Date: Wed, 3 Apr 2013 13:27:36 +0200 (CEST)
User-agent: Alpine 2.02 (DEB 1266 2009-07-14)

Hello.

Received this report from the Debian bug system. I initially believed
this to be a duplicate of Debian Bug#633978, but it's not.

Here is a way to reproduce it, provided by the submitter after the
initial report:

--------------------------------------------------------
Here are a few command you may use to reproduce the bug

mkdir d1 d2
echo azerty > "d1/エンドカード1"
echo qsdfgh > "d2/ブックレット1"

If the bug is present, diff will return
> LANG=some_non_asian_LOCALE.utf8 diff -r d1 d2
1c1
< azerty
---
> qsdfgh

if the bug is not present you will have something like :
> LANG=C diff -r d1 d2
Only in d1: エンドカード1
Only in d2: ブックレット1
--------------------------------------------------------

I can also reproduce it with diffutils 3.3, this is the output in such case:

diff 
"d1/\343\202\250\343\203\263\343\203\211\343\202\253\343\203\274\343\203\2111" 
"d2/\343\203\226\343\203\203\343\202\257\343\203\254\343\203\203\343\203\2101"
1c1
< azerty
---
> qsdfgh

Follows the initial report:

---------- Forwarded message ----------
From: Philippe Errembault
To: Debian Bug Tracking System <address@hidden>
Date: Fri, 29 Mar 2013 03:10:46 +0100
Subject: Bug#704182: diffutils: Diff -r will confusion between asian characters
    in filenames, when locale are non asian - UTF-8.

Package: diffutils
Version: 1:3.0-1
Severity: normal


I don't know if this bug is caused by diff or by strcoll.
When comparing filenames with strcoll, using non asian utf8 locales,
chinese characters are considered identical, whichs lead to confusion
between files which are differents. 

E.g.: if you diff -r two directories with files in different orders,
because they where on different file systems, written with different OS.
For an example, I wanted to diff a copy on a server, of a directory from 
an NTFS disk. or simply because the files lists are not the same, and
the sort happens differently. then, diff may consider as two different
files as being the same, and report differences because it compares
different files. for examples, in my situation, it believed that
"エンドカード1.jpg" and "ブックレット1.jpg" were files with the same name
and reported errors between them.

The point, is that, I don't know if it is or not normal that
strcoll("エンドカード1.jpg", "ブックレット1.jpg"); returns 0 when locale
is anything_non_asian.utf-8

[...]



reply via email to

[Prev in Thread] Current Thread [Next in Thread]