Re: [Nmh-workers] Non-ASCII Characters in bodies and subjects

nmh-workers

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Nmh-workers] Non-ASCII Characters in bodies and subjects

From:	Ken Hornstein
Subject:	Re: [Nmh-workers] Non-ASCII Characters in bodies and subjects
Date:	Wed, 18 Jun 2014 09:22:28 -0400

>> That's not universally true anymore.  Some newer filesystems are
>> mandating that filenames are UTF-8 and enforcing normalization rules
>> (MacOS X and Solaris are two notable examples).
>
>Thanks, I didn't know.  Haven't used Solaris in years, and never bought
>Apple.

Let me amend this a bit; as I understand it, you have to enable that
behavior on Solaris.  It's the default behavior on MacOS X.

>> Solaris is better; the original bytes are preserved, but lookup is
>> done using normalized names so you can't have two filenames with the
>> same characters.
>
>What about globbing, especially on Mac OS X?  Given your two examples on
>Linux with bash,
>[...]

So, clearly we need some userspace support.  AFAIK, the globbing isn't
Unicode-aware; it's just matching on whatever readdir() returns.  Should
a ? match on a byte?  A Unicode codepoint?  An abstract character?  I am
not sure, and I am not sure if anyone has decided on this from a standards
point of view.

>Do you think NFKC would be better, so ? often matches what appears as a
>single rune and fi matches ligature ﬁ?

Hm.  I believe some network filesystems use NFKC, but I am neutral on
what should be done.  Should fi match ﬁ?  I cannot decide; I see
arguments for both.

--Ken

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [Nmh-workers] Non-ASCII Characters in bodies and subjects, (continued)

Prev by Date: Re: [Nmh-workers] A permute command for nmh 1.7 ?
Next by Date: [Nmh-workers] error: mhshow: unable to convert character set
Previous by thread: Re: [Nmh-workers] Non-ASCII Characters in bodies and subjects
Next by thread: [Nmh-workers] netbsd 7 buildbot added
Index(es):
- Date
- Thread