Hello,
I have spotted bizarre bug in gnu find.
In some circumstances, find result on '-regex' search is very dependendant
on locale settings.
I have attached a zip file, with example file tree. There are two
directories in it, one's name encoded with 'utf-8' and other - in
iso-8859-2.
Now we run find, trying to find files matching regex '.*\.exe'
$ LANG=pl_PL.iso-8859-2 find htdocs -type f -regex '.*\.exe$' -ls
12845718 12 -rw-rw-r-- 1 gacek gacek 2 Dec 18 15:00
htdocs/Zielona\ G\363ra/hidden_malware.exe
12845721 12 -rw-rw-r-- 1 gacek gacek 2 Dec 18 15:00
htdocs/Zielona\ G\303\263ra/malware.exe
Never mind the output encoding, it's expected. We have luckily found two
.exe files.
But now, let's try to change locale to something more modern:
$ LANG=pl_PL.utf-8 find htdocs -type f -regex '.*\.exe$' -ls
12845721 12 -rw-rw-r-- 1 gacek gacek 2 gru 18 15:00
htdocs/Zielona\ G\303\263ra/malware.exe
We have found only one of these files. One with iso-encoded filename is
hidden!