[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Nmh-workers] Non-ASCII Characters in bodies and subjects
From: |
Ralph Corderoy |
Subject: |
Re: [Nmh-workers] Non-ASCII Characters in bodies and subjects |
Date: |
Wed, 18 Jun 2014 12:01:37 +0100 |
Hello Ken,
> > The Unix kernel stores filenames as a run of bytes, not including
> > `/' and NUL.
>
> That's not universally true anymore. Some newer filesystems are
> mandating that filenames are UTF-8 and enforcing normalization rules
> (MacOS X and Solaris are two notable examples).
Thanks, I didn't know. Haven't used Solaris in years, and never bought
Apple.
> The only way of resolving this is to use the normalization rules for
> Unicode and do filename searching that way;
Sure.
> MacOS X actually rewrites all of the filenames using Normalization
> Form D (all characters in decomposed form, which means the regular
> character followed by the combining accents) and I think that sucks,
> but they didn't ask me.
I think I agree with you.
> Solaris is better; the original bytes are preserved, but lookup is
> done using normalized names so you can't have two filenames with the
> same characters.
What about globbing, especially on Mac OS X? Given your two examples on
Linux with bash,
$ touch résumé résumé
$ ls r?sum?
résumé
$ ls r?sum? | recode ..dump
UCS2 Mne Description
0072 r latin small letter r
00E9 e' latin small letter e with acute
0073 s latin small letter s
0075 u latin small letter u
006D m latin small letter m
00E9 e' latin small letter e with acute
000A LF line feed (lf)
$
$ ls r??sum??
résumé
$
Do you think NFKC would be better, so ? often matches what appears as a
single rune and fi matches ligature fi?
Cheers, Ralph.
- Re: [Nmh-workers] Non-ASCII Characters in bodies and subjects, (continued)
- Re: [Nmh-workers] Non-ASCII Characters in bodies and subjects, Ken Hornstein, 2014/06/16
- Re: [Nmh-workers] Non-ASCII Characters in bodies and subjects, norm, 2014/06/17
- Re: [Nmh-workers] Non-ASCII Characters in bodies and subjects, Ralph Corderoy, 2014/06/17
- Re: [Nmh-workers] Non-ASCII Characters in bodies and subjects, norm, 2014/06/17
- Re: [Nmh-workers] Non-ASCII Characters in bodies and subjects, Jerrad Pierce, 2014/06/17
- Re: [Nmh-workers] Non-ASCII Characters in bodies and subjects, Ken Hornstein, 2014/06/17
- Re: [Nmh-workers] Non-ASCII Characters in bodies and subjects, Earl Hood, 2014/06/17
- Re: [Nmh-workers] Non-ASCII Characters in bodies and subjects, Ken Hornstein, 2014/06/17
- Re: [Nmh-workers] Non-ASCII Characters in bodies and subjects, Ralph Corderoy, 2014/06/17
- Re: [Nmh-workers] Non-ASCII Characters in bodies and subjects, Ken Hornstein, 2014/06/17
- Re: [Nmh-workers] Non-ASCII Characters in bodies and subjects,
Ralph Corderoy <=
- Re: [Nmh-workers] Non-ASCII Characters in bodies and subjects, Ken Hornstein, 2014/06/18