[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: bug in basename

From: Bruno Haible
Subject: Re: bug in basename
Date: Fri, 15 May 2009 09:11:28 +0200
User-agent: KMail/1.9.9

Ondrej Bilka wrote:
> For encodings like BIG5 if character contains / it could quit prematurely.
Eric Blake wrote:
> BIG5 is a lousy character encoding for the very
> reason that it confuses common ASCII bytes with encoded characters,
> depending on shift state.

BIG5 does not have shift state. BIG5 is a stateless multibyte encoding,
composed of two character sets:

  first byte   second byte

  0x00..0x7F                                  (ASCII)
  0xA1..0xFE   0x40..0x7E,0xA1..0xFE          (BIG5)

The '/' is not among the range of allowed byte values for the second
byte. Therefore strchr(s,'/') and strrchr(s,'/') work fine also in BIG5
encoded strings.

> character encodings dependent on your locale (except on Mac, and look at the
> problems that caused)

The current problem with filename on MacOS X is that the underlying filesystem,
HFS+, stores filenames in decomposed Unicode. I.e. when the user creates a
file with a filename with accents (precomposed Unicode, as usual), the file
that gets created has a different name, its decomposed Unicode form. This is
quite annoying because
  - the file name that one can retrieve with "ls" is different from the
    specified file name,
  - it goes against the Character Model of the W3C [1], which recommends
    NFC (not NFD) normalization.


[1] http://www.w3.org/TR/charmod-norm/

reply via email to

[Prev in Thread] Current Thread [Next in Thread]