[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [bug-libunistring] toCasefold?
From: |
Bruno Haible |
Subject: |
Re: [bug-libunistring] toCasefold? |
Date: |
Fri, 27 May 2011 21:31:14 +0200 |
User-agent: |
KMail/1.9.9 |
Hi Simon,
> I'm looking for an implementation of the toCasefold(X) operation defined
> in Unicode 6.0 section 3.13 page 114 [1] like this:
>
> R4 toCasefold(X): Map each character C in X to Case_Folding(C).
>
> • Case_Folding(C) uses the mappings with the status field value “C” or
> “F” in the data file CaseFolding.txt in the Unicode Character
> Database.
This function maps a string X to a sting.
> Reading the manual I found this function:
>
> -- Function: uint32_t * u32_casefold (const uint32_t *S, size_t N,
> const char *ISO639_LANGUAGE, uninorm_t NF, uint32_t
> *RESULTBUF, size_t *LENGTHP)
> Returns the case folded string.
>
> but I'm not sure what to use for ISO639_LANGUAGE
If you want a locale independent case folding, you can use the empty string
as ISO639_LANGUAGE.
> After reading the u32_casefold code, I found the seamingly appropriate
> function uc_tocasefold:
>
> /* Return the casefold mapping of a Unicode character. */
> extern ucs4_t
> uc_tocasefold (ucs4_t uc);
>
> However it doesn't seem to produce the right output, since
> uc_tocasefold(U+0130) returns U+0130.
No, this function is not appropriate, because it maps a single character
to a single character only. It cannot do the mapping
<U+0130> --> <U+0069><U+0307>
that you find in Unicode's CaseFolding.txt file.
> looking at the
> implementation I'm not sure it really corresponds to the toCasefold
> algorithm since it seems quite complex whereas Unicode toCasefold seems
> just like a property lookup function.
The u32_casefold function also handles the locale dependent casing, that
toCasefold does not do (file SpecialCasing.txt). That explains the complexity.
Bruno