[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: bad locale interaction makes find forget files

From: Andreas Metzler
Subject: Re: bad locale interaction makes find forget files
Date: Fri, 18 Jul 2003 13:56:49 +0200
User-agent: Mutt/1.3.28i

On Wed, May 21, 2003 at 01:26:57PM +0200, Michael Weber wrote:
> find behaves very strangely with non-ASCII named files, depending on
> what locale is set (test case below).

Simplest testcase looks like this (\366 is umlaut o in latin1)
*prompt* export LANG=de_DE.UTF-8
*prompt* touch  `printf '\366.foo'`
*prompt* find -name '*foo'

The reason is simple: '\366.foo' interpreted as UTF-8 is an invalid
sequence (In Unicode 1 byte != 1 character!) and not 'something.foo'
and fnmatch[1] returns an error[2]. I cannot see a way to really fix this:

* find has to be locale-aware:
UTF-terminal: touch ö
latin1-terminal ls
So you'd expect that in the latin1-Terminal "find -name 'ä'"
will succeed.

* The obvious answer does not work: You can't just run iconv and strip
away nondisplayable characters because of the simple fact that there is
no canonical representation.
             cu andreas
[1] pred_name() in find/pred.c
[2] example program:
#include <fnmatch.h>
#include <locale.h>
int main()
        exit(fnmatch ("*foo", "\366foo", FNM_PERIOD));

reply via email to

[Prev in Thread] Current Thread [Next in Thread]