[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Wide string strategies

From: Mike Gran
Subject: Wide string strategies
Date: Thu, 09 Apr 2009 08:00:12 -0700


I've been playing with the problem of wide strings in Guile.  I have
some observations.

First, I tried converting everything to UTF-8.  This strategy quickly
leads to complications.  Most importantly, scm_i_string_length() can
mean either string length, string memory size, or both.  I tried
splitting that function into two functions, scm_i_string_nchars() and
scm_i_string_memsize(), but, I didn't like how that was going.  It was
too easy to make a mistake.

Second, I tried converting everything to UTF-32.  This strategy requires
too much effort.  There are too many char* in the code to convert each

For now, I think a good strategy is to make strings into a pseudo-class
where the internals are opaque to most of Guile and strings are accessed
through accessors and other methods.  

This was the strategy already begun with scm_to_locale_string but the
code isn't fully committed to the idea.  The function scm_i_string_chars
exposes the internal representation of the string, and it is used
throughout the code.

The following patch demonstrates what it might look like if strings were
accessed through methods.  I've removed every instance of
scm_i_string_chars and associated functions from the non-string modules.

One possibly confusing function used in the patch is
scm_i_string_ref_to_char (str, x, sub).  This gets the Xth character of
STR as a C char, or returns the character SUB if the Xth character is
not ASCII.


Mike Gran

Attachment: accessors.patch
Description: Text Data

reply via email to

[Prev in Thread] Current Thread [Next in Thread]