[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
GH replacement proposal (includes a bit of Unicode)
GH replacement proposal (includes a bit of Unicode)
Wed, 07 Apr 2004 15:00:06 +0200
Gnus/5.1003 (Gnus v5.10.3) Emacs/21.3 (gnu/linux)
I have a partial proposal for making type conversions between Scheme
and C easier.
The first question is, do we need something different? Is the stuff
below better enough to be worth the trouble of making everyone switch
I think it does improve upon the existing situation by being
- thread safe (allowing true concurrency)
- more consistent
- allowing sophisticated internal data representation (for example
for copy-on-write substrings, Unicode, etc.)
- macro-free so that we can maintain binary compatibility easier
Then there is the error handling: the functions below do not take
"subr" or "pos" arguments to indicate where the error has happened. I
don't think they are really needed and in any case only provides part
of the backtrace.
One important part of the Guile API is concerned with the conversion
between Scheme values and C values. The functions that perform these
conversions follow a common pattern.
* Type predicates
Type predicates for C code are named like this
int scm_is_<type> (SCM val);
They return 0 or 1.
There are also the usual predicates that return a Scheme boolean, such
* Conversion from C to Scheme
For a C type <type>, the function that converts it into a Scheme value
SCM scm_from_<type> (<type> val, ...);
This function could be pronounced as "make Scheme from <type>" to
remember that the conversion is from <type> to a Scheme object.
No error will usually be signalled, except when not enough memory is
Sometimes a function named
SCM scm_take_<type> (<type> val, ...);
is provided. ("let Scheme take <type>".) This function works like
scm_from_<type> but the memory associated with VAL will be taken over
* Conversion from Scheme to C
<type> scm_to_<type> (SCM val, ...);
("convert Scheme to <type>".) When VAL is not representable as a
<type> or additional constraints are not satisfied, an error is
* Concrete functions
- SCM scm_is_bool (SCM val);
- SCM scm_from_bool (int val);
Return SCM_BOOL_T when val in non-zero, else return SCM_BOOL_F.
- int scm_to_bool (SCM);
- int scm_is_true (SCM);
Return 0 when SCM is SCM_BOOL_F, else return 1.
- SCM scm_is_integer (SCM val);
Determine whether VAL is an integer, exact or inexact. Note that
the number 3.0 is an inexact integer although it is stored as a
- SCM scm_from_signed_integer (scm_t_intmax val);
- SCM scm_from_unsigned_integer (scm_t_uintmax val);
Return the SCM value representing the integer <val>. The SCM
value will always be exact.
- scm_t_intmax scm_to_signed_integer (SCM val,
scm_t_intmax min, scm_t_intmax max);
- scm_t_uintmax scm_to_unsigned_integer (SCM val, scm_t_uintmax max);
Convert the SCM value VAL to a C integer when it is representable
and when it is between min and max inclusive, or between 0 and max
inclusive. Signal an error when it isn't. The SCM value can be
exact or inexact, but it must be an integer. That is,
scm_to_signed_integer (scm_from_double (3.0), -100, +100)
yields the C integer 3 while
scm_to_signed_integer (scm_from_double (3.5), -100, +100)
is an error.
- SCM scm_from_char (signed char);
- SCM scm_from_short (short);
- SCM scm_from_int (int val);
- SCM scm_from_long (long val);
- SCM scm_from_longlong (long long val);
- SCM scm_from_ssize (ssize_t val);
- SCM scm_from_uchar (unsigned char);
- SCM scm_from_ushort (unsigned short);
- SCM scm_from_uint (unsigned int val);
- SCM scm_from_ulong (unsigned long val);
- SCM scm_from_ulonglong (unsigned long long val);
- SCM scm_from_size (size_t val);
- signed char scm_to_char (SCM);
- short scm_to_short (SCM);
- int scm_to_int (SCM);
- long scm_to_long (SCM);
- long long scm_to_longlong (SCM);
- ssize_t scm_to_ssize (SCM);
- unsigned char scm_to_uchar (SCM);
- unsigned short scm_to_ushort (SCM);
- unsigned int scm_to_uint (SCM);
- unsigned long scm_to_ulong (SCM);
- unsigned long long scm_to_ulonglong (SCM);
- size_t scm_to_size (SCM);
Convert from/to the indicated integral types, signalling errors
when the SCM value can not be represented. For integer types that
are not provided for, you can use the general functions from
above. For example, scm_from_short (x) is the same as
and scm_to_short (x) is the same as
((short)(scm_to_signed_integer (x, SHORT_MIN, SHORT_MAX)))
Thus, these functions are merely a convenience.
Note that scm_to_char can not convert a Scheme character to a C
char integer. See below.
** Floating point numbers
We don't go to such a great length to cover all possible types
here. "double" ought to be enough, no?
- int scm_is_real (SCM val);
Determine whether VAL is a real number, inexact or exact. Note that
a number such as 1/3 or 0 is real, although it is not stored as a
- SCM scm_from_double (double val);
Return the SCM value corresponding to VAL. The SCM value will be
'inexact' as far as scm_inexact_p is considered but will be
exactly equal to VAL. When you want to have an exact SCM value,
scm_inexact_to_exact (scm_from_double (val))
this will yield an exact fraction.
- double scm_to_double (SCM);
Convert VAL to the closest number representable as a double.
Numbers that are too large or too small are converted into +Inf or
** Complex numbers
- int scm_is_complex (SCM val);
Determine whether VAL is a complex number, inexact or exact. Note
that a number such as 1/3 is complex, although it is not stored as
Complex numbers can be regarded as a compound type and need no
dedicated conversion functions. For example, you can do
scm_make_rectangular (scm_from_double (0.0), scm_from_double (1.0))
double imag = scm_to_double (scm_imag_part (z));
but there are also convenience functions that are actually a bit more
- SCM scm_from_complex_double (double re, double im);
- double scm_to_real_part_double (SCM z);
- double scm_to_imag_part_double (SCM z);
But remember to use the generic functions scm_make_rectangular,
scm_real_part, etc if you don't care whether the parts of a complex
number are floating point numbers or not. For example, Guile might
someday offer complex numbers where the real part is a fraction
(currently it is always a double) and it is good to be prepared for
this by not treating the parts of a complex as doubles when it is not
A Scheme character in Guile is equivalent to a Unicode code point.
- int scm_is_character (SCM val);
- long scm_to_unicode (SCM ch);
- SCM scm_from_unicode (long code);
Strings present the new problem that memory needs to be allocated or
found for storing the result. Also, when new memory has been
allocated, one needs to make sure that it isn't leaked in the case of
non-local exits (like from errors in subsequent conversions). Such a
cleanup action can be registered with scm_frame_unwind_handler, which
- int scm_is_string (SCM val);
- SCM scm_from_locale_string (unsigned char *str, ssize_t len);
Return a new Scheme string initialized with STR, a string encoded
according to the current locale. When LEN is -1, STR must be
zero-terminated and its length is found that way. Otherwise LEN
gives the length of STR.
- SCM scm_from_utf8_string (unsigned char *str, ssize_t len);
Same as above, but STR is encoded in UTF-8. Future versions of
Guile will use UTF-8 internally and then this function will not need
to perform any conversions at all.
- SCM scm_take_utf8_string (unsigned char *str, ssize_t len);
Same as above, but the memory for STR is taken over by Guile. It
will eventually be freed using libc 'free'.
- unsigned char *scm_to_locale_string (SCM str, size_t *lenp);
Convert STR into a C string that is encoded as specified by the
current locale. Memory is allocated for the C string that can be
freed with 'free'.
When the current locale can not encode STR, an error is signalled.
When LENP is not NULL, the number of bytes contained in the returned
string is stored in *LENP. The string is zero-terminated, but it
might contain zero characters in the middle.
When LENP is NULL and the string does indeed contain a zero
character, it is not encodable and an error is signalled.
- unsigned char *scm_to_utf8_string (SCM str, size_t *lenp);
Same as above but returns a UTF-8 encoded string. This will always
work when LENP is non-NULL.
[ More encodings can be specified later, for example by just
referring to the character sets supported by 'iconv'. The above
two, locale and utf8, are needed for transitioning Guile to
Unicode. Right now, strings are in the locale encoding but in the
future they will be in UTF-8. ]
The above functions always return newly allocated memory. When that
is deemed too expensive, the following functions can be used instead.
However, care must be taken to use them correctly and reasonably.
- scm_lock_heap ();
- scm_unlock_heap ();
These two functions lock and unlock all SCM objects (the heap). The
heap should not be locked for long periods of time and no calls to
'normal' libguile functions are allowed while it is locked. A
function is 'normal' unless it is specifically documented to be
useable with a locked heap. (Indeed, most 'unnormal' functions can
_only_ be used while the heap is locked.)
You can not lock the heap twice. Calling scm_lock_heap while the
heap is already locked results in undefined behavior. Likewise,
calling scm_unlock_heap when the heap is not locked is verboten.
- const unsigned char *scm_l_get_utf8_string_mem (SCM str);
Return a pointer to the internal UTF-8 bytes of STR. This function
can only be called while the heap is locked and the returned pointer
becomes invalid when the heap is unlocked later on. The string is
_not_ guaranteed to be zero-terminated, you _must_ use
scm_l_get_utf8_string_len (see below).
You are not allowed to modify the string contents.
(The "scm_l_" prefix denotes a function that must be called with a
- size_t scm_l_get_utf8_string_len (SCM str);
Return the length in bytes of STR. Heap must be locked.
Symbols have strings as their names and you can get that name via
scm_symbol_to_string. However, it is more efficient to convert
to/from a symbol directly.
- int scm_is_symbol (SCM val);
- SCM scm_from_locale_symbol (unsigned char *str, ssize_t len);
- SCM scm_from_utf8_symbol (unsigned char *str, ssize_t len);
- SCM scm_take_utf8_symbol (unsigned char *str, ssize_t len);
- unsigned char *scm_to_locale_symbol (SCM str, size_t *lenp);
- unsigned char *scm_to_utf8_symbol (SCM str, size_t *lenp);
- const unsigned char *scm_l_get_utf8_symbol_mem (SCM str);
- size_t scm_l_get_utf8_symbol_len (SCM str);
** Uniform vectors
[ Uniform vectors should get the same kind of support as strings, but
without the encoding business of course. ]
- int scm_is_u8vector (SCM val);
- SCM scm_from_u8vector (unsigned char *vec, size_t len);
- SCM scm_take_u8vector (unsigned char *vec, size_t len);
- unsigned char *scm_to_u8vector (SCM vec, sizte_t *lenp);
- unsigned char *scm_l_get_u8vector_mem (SCM vec);
- size_t scm_l_get_u8vector_len (SCM vec);
** Compound types
- int scm_is_pair (SCM val);
- SCM scm_car (SCM pair);
- SCM scm_cdr (SCM pair);
- int scm_is_list (SCM val);
- SCM scm_c_list_ref (SCM list, int idx);
- SCM scm_c_list_set (SCM list, int idx, SCM val);
- int scm_c_list_length (SCM list);
- int scm_is_vector (SCM val);
- SCM scm_c_vector_ref (SCM vec, int idx);
- SCM scm_c_vector_set (SCM vec, int idx, SCM val);
- int scm_c_vector_length (SCM vec);
Additional types can be handled with code like
if (scm_from_bool (scm_procedure_p (val)))
- GH replacement proposal (includes a bit of Unicode),
Marius Vollmer <=
- Re: GH replacement proposal (includes a bit of Unicode), Paul Jarc, 2004/04/07
- Re: GH replacement proposal (includes a bit of Unicode), Marius Vollmer, 2004/04/13
- Delivery failure (address@hidden), Bruce Korb, 2004/04/21
- Re: GH replacement proposal (includes a bit of Unicode), Marius Vollmer, 2004/04/21
- Re: GH replacement proposal (includes a bit of Unicode), Paul Jarc, 2004/04/21
- Re: GH replacement proposal (includes a bit of Unicode), Dale P. Smith, 2004/04/21