[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[bug-libunistring] toCasefold?
From: |
Simon Josefsson |
Subject: |
[bug-libunistring] toCasefold? |
Date: |
Fri, 27 May 2011 20:21:16 +0200 |
User-agent: |
Gnus/5.110018 (No Gnus v0.18) Emacs/23.2 (gnu/linux) |
Hi Bruno,
I'm looking for an implementation of the toCasefold(X) operation defined
in Unicode 6.0 section 3.13 page 114 [1] like this:
R4 toCasefold(X): Map each character C in X to Case_Folding(C).
• Case_Folding(C) uses the mappings with the status field value “C” or
“F” in the data file CaseFolding.txt in the Unicode Character
Database.
Reading the manual I found this function:
-- Function: uint32_t * u32_casefold (const uint32_t *S, size_t N,
const char *ISO639_LANGUAGE, uninorm_t NF, uint32_t
*RESULTBUF, size_t *LENGTHP)
Returns the case folded string.
but I'm not sure what to use for ISO639_LANGUAGE, and looking at the
implementation I'm not sure it really corresponds to the toCasefold
algorithm since it seems quite complex whereas Unicode toCasefold seems
just like a property lookup function.
After reading the u32_casefold code, I found the seamingly appropriate
function uc_tocasefold:
/* Return the casefold mapping of a Unicode character. */
extern ucs4_t
uc_tocasefold (ucs4_t uc);
However it doesn't seem to produce the right output, since
uc_tocasefold(U+0130) returns U+0130. And it is not declared in
"unicase.h" so it looks like an internal function.
Anyway, is it possible to use libunistring to get the toCasefold
operation somehow?
Thanks,
/Simon
[1] http://www.unicode.org/versions/Unicode6.0.0/ch03.pdf
- [bug-libunistring] toCasefold?,
Simon Josefsson <=