[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[GNUnet-developers] encoding: normalization [Was: Re: Music insertion]

From: Christian Grothoff
Subject: [GNUnet-developers] encoding: normalization [Was: Re: Music insertion]
Date: Sun, 5 Dec 2004 15:18:17 -0500
User-agent: KMail/1.7.1

On Saturday 04 December 2004 17:20, Alexander Winston wrote:
> Unicode provides 4 normalization forms
> (<>):
> * Normalization Form D (NFD)
> * Normalization Form C (NFC)
> * Normalization Form KD (NFKD)
> * Normalization Form KC (NFKC)
> Given the nature of GNUnet, I suggest normalizing all the proposed
> keywords using NFC and NFKC, removing the duplicate keywords, and then
> adding the remaining keywords.
> I still have little experience with normalization, however, so please
> take this advice with a grain of salt.

Right.  Even if we use UTF-8, we still have to think about normalization.  And 
I believe this issue fully applies to UTF-8 (after all, UTF-8 is just a 
unicode encoding).  Actually, it might be worse: if I recall correctly there 
are different UTF-8 encodings for some unicode characters, so we have the 
normalization issue for unicode *and* for UTF-8.  So if anyone has any 
experience here, please speak up.  I was thinking of using libiconv to 
convert to UTF-8.  Will this produce a canonical representation?  Which one? 
If not, is there some free code available that will do the canonicalization? 


reply via email to

[Prev in Thread] Current Thread [Next in Thread]