|
From: | Eric Blake |
Subject: | Re: find: locale affects results incorrectly |
Date: | Fri, 7 Aug 2020 08:45:57 -0500 |
User-agent: | Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.10.0 |
On 8/7/20 8:06 AM, sunnycemetery@gmail.com wrote:
... possibly. Please see for yourself:■ LC_ALL=C ls -l total 1-rw-r--r-- 1 userx userx 0 Aug 7 08:35 ''$'\325\253\302\265\366''+'$'\325\361\275\322\374\253\322\342\203\322\351''+'$'\322\351\245\322\342\304\264''+'$'\364''rd'$'\264''+'$'\342''07.srt'■ echo $LANG ja_JP.utf8 ■ find -name '*.srt' ■ LC_ALL=C find -name '*.srt' ./?????+???????????+???????+?rd?+?07.srtI have attached logs of the following debug command for either locale, with ‘ and ’ replaced with ' for quick diff comparison. Debug output does not elucidate much, but perhaps someone can shed light on how such a seemingly simple search could possibly fail (or even be affected by locale in the first place).find -D all -name '*.srt'
'find' is not part of coreutils. That said, you are correct that globbing is locale-sensitive. You have a filename that uses invalid encodings in some locales but not others. But POSIX says that the '*' glob only has to match characters, not encoding errors. So your choice of locale (and thus which byte sequences are valid characters) indeed affects the results of the glob, and therefore what find is able to output.
I would argue that this is not a bug, but you may get other opinions if you ask on bug-findutils.
-- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org
[Prev in Thread] | Current Thread | [Next in Thread] |