guile-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

about strings, symbols and chars.


From: Dirk Herrmann
Subject: about strings, symbols and chars.
Date: Tue, 28 Nov 2000 19:23:56 +0100 (MET)

Hello everybody.

A couple of weeks ago, guile was in a state where SCM_CHARS and similar
macros were used for a variety of types:  symbols, strings, vectors,
continuations etc.  This had the undesired effect that it would have been
difficult to switch to a different implementation of any of those data
types.  For example, switching to a copy-on-write string implementation
would have required to check every call to SCM_CHARS in guile, determine
whether it was applied to a string object, and only then modify that code
according to the requirements of the new implementation.

As of today, SCM_CHARS and friends are deprecated and replaced with
SCM_<type>_CHARS and corresponding macros.  For example, among the
replacements for SCM_CHARS are SCM_SYMBOL_CHARS, SCM_STRING_CHARS,
SCM_VECTOR_BASE, etc.  This means that strings and symbols are now cleanly
separated from all the other types.  For the vector types this is
unfortunately not quite true yet, since the macro SCM_VELTS is used by all
the different vector types.

What I noticed, however, is that guile is not consistent at all with
regards to the handling of character signedness:  Converting a character
object from scheme to C with SCM_CHAR delivers an unsigned integer.  A lot
of code, however, assigns the result of SCM_CHAR to a char variable
instead of an unsigned char.  Formerly there existed to macros SCM_CHARS
and SCM_UCHARS, which for a string delivered a char* or an unsigned char*
to the string's characters - for strings these are now named
SCM_STRING_CHARS and SCM_STRING_UCHARS, respectively.

I'd like to get rid of the SCM_STRING_UCHARS macro and clean up the
handling of characters and strings with respect to signedness.  In other
words, it should be clearly defined what kind of characters are to be
found in a scheme string object.

With respect to the support of more general character sets it seems to
make sense to make unsigned characters the default.  Signed characters
should then rather be viewed as uniform arrays of signed 8-bit values, as
described in some SRFI.  The not-so-nice thing about this approach is the
fact, that all gh_ functions just use chars, without indication of
signedness, and the same is true for a lot of scm_ functions as well.

Any suggestions?

Dirk Herrmann




reply via email to

[Prev in Thread] Current Thread [Next in Thread]