nmh-workers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Nmh-workers] Non-ASCII Characters in bodies and subjects


From: Ken Hornstein
Subject: Re: [Nmh-workers] Non-ASCII Characters in bodies and subjects
Date: Wed, 18 Jun 2014 09:22:28 -0400

>> That's not universally true anymore.  Some newer filesystems are
>> mandating that filenames are UTF-8 and enforcing normalization rules
>> (MacOS X and Solaris are two notable examples).
>
>Thanks, I didn't know.  Haven't used Solaris in years, and never bought
>Apple.

Let me amend this a bit; as I understand it, you have to enable that
behavior on Solaris.  It's the default behavior on MacOS X.

>> Solaris is better; the original bytes are preserved, but lookup is
>> done using normalized names so you can't have two filenames with the
>> same characters.
>
>What about globbing, especially on Mac OS X?  Given your two examples on
>Linux with bash,
>[...]

So, clearly we need some userspace support.  AFAIK, the globbing isn't
Unicode-aware; it's just matching on whatever readdir() returns.  Should
a ? match on a byte?  A Unicode codepoint?  An abstract character?  I am
not sure, and I am not sure if anyone has decided on this from a standards
point of view.

>Do you think NFKC would be better, so ? often matches what appears as a
>single rune and fi matches ligature fi?

Hm.  I believe some network filesystems use NFKC, but I am neutral on
what should be done.  Should fi match fi?  I cannot decide; I see
arguments for both.

--Ken



reply via email to

[Prev in Thread] Current Thread [Next in Thread]