Re: Internal visibility

From: Mike Gran
Subject: Re: Internal visibility
Date: Thu, 12 Jun 2008 20:45:25 +0000 (UTC)


Ludovic Courtès <ludo <at>> writes:

> Yes, that's probably a good idea.  At any rate, we only have
> `scm_to_locale_string ()' currently so it's not too late to add a single
> function with an encoding parameter in lieu of the proposed
> `scm_to_{utf8,utf16,utf32,ucs4,...}_string ()'.
> But first of all, one needs to implement Unicode support.  

FWIW, I have a complete unicode support library for Guile called GuICU.  It 
lives at  It works for me, but, hasn't been 
widely tested.

It is built on the large and cumbersome IBM ICU library.  ICU encodes things 
internally as UTF16, which I always though of as a poor idea, since neither 
allows O(1) seeking of individual codepoints nor works so well with UTF-8.

Based on my experience with ICU and putting this library together, and looking 
at what r6rs claims should be the future for Unicode, I really do think that 
UTF-32 is the way to go. 

Alternately, one could build a string library where strings are represented as 
either u8 or u32 vectors.  If a string function is asked to operate on a u32 
vector, it will assume a UTF32 encoding.  If a string function is asked to 
operate on a u8 vector it will either require a locale or, as a fallback, 
treat the string as a raw byte vector.

This would be twice the work to implement, though.

