bug-libunistring
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[bug-libunistring] toCasefold?


From: Simon Josefsson
Subject: [bug-libunistring] toCasefold?
Date: Fri, 27 May 2011 20:21:16 +0200
User-agent: Gnus/5.110018 (No Gnus v0.18) Emacs/23.2 (gnu/linux)

Hi Bruno,

I'm looking for an implementation of the toCasefold(X) operation defined
in Unicode 6.0 section 3.13 page 114 [1] like this:

  R4 toCasefold(X): Map each character C in X to Case_Folding(C).

  • Case_Folding(C) uses the mappings with the status field value “C” or
    “F” in the data file CaseFolding.txt in the Unicode Character
    Database.

Reading the manual I found this function:

 -- Function: uint32_t * u32_casefold (const uint32_t *S, size_t N,
          const char *ISO639_LANGUAGE, uninorm_t NF, uint32_t
          *RESULTBUF, size_t *LENGTHP)
     Returns the case folded string.

but I'm not sure what to use for ISO639_LANGUAGE, and looking at the
implementation I'm not sure it really corresponds to the toCasefold
algorithm since it seems quite complex whereas Unicode toCasefold seems
just like a property lookup function.

After reading the u32_casefold code, I found the seamingly appropriate
function uc_tocasefold:

/* Return the casefold mapping of a Unicode character.  */
extern ucs4_t
       uc_tocasefold (ucs4_t uc);

However it doesn't seem to produce the right output, since
uc_tocasefold(U+0130) returns U+0130.  And it is not declared in
"unicase.h" so it looks like an internal function.

Anyway, is it possible to use libunistring to get the toCasefold
operation somehow?

Thanks,
/Simon

[1] http://www.unicode.org/versions/Unicode6.0.0/ch03.pdf



reply via email to

[Prev in Thread] Current Thread [Next in Thread]