[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Unicode strings and symbols

From: Ludovic Courtès
Subject: Re: Unicode strings and symbols
Date: Mon, 10 Aug 2009 23:27:48 +0200
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/23.1 (gnu/linux)


Mike Gran <address@hidden> writes:


>> > +SCM_API void scm_charprint (scm_t_uint32 c, SCM port);
>> This ought to be internal, no?
> Could be.  A couple of the types are given their own print functions:
> scm_intprint and an scm_uintprint.  Most types don't have their own
> print functions.  Are int and uint given special treatment because of
> their radix term?

Dunno.  Anyway, they're not really meant to be public either.  Feel free
to make them internal as well, while you're at it.  ;-)

>> > +          (scm_t_wchar) (unsigned char) STRINGBUF_INLINE_CHARS (buf)[i];
>> Is the double cast needed?
> Sort of.  Unsigned char will successfully be implicitly cast to
> scm_t_wchar, so the scm_t_wchar term is just for clarity.  The unsigned
> char term is definitely needed. Negative 8-bit chars are the upper half
> of the 8-bit charset (128 - 255).  Casting them directly to scm_t_wchar
> may return 0xFFFFFF80 - 0xFFFFFFFF instead of 128-255.  I don't have any
> problem removing the scm_t_wchar cast.  Would you prefer that? 

How about:

  #define STRINGBUF_INLINE_CHARS(buf)                   \
    ((unsigned char *) SCM_CELL_OBJECT_LOC ((buf), 1))

and changing the caller to:

  for (i = 0; i < len; i++)
    mem[i] = (scm_t_wchar) STRINGBUF_INLINE_CHARS (buf)[i];


That would make the intent clearer to me.

> I put it in because that information needs to be available in the
> bytecode compiler.  A slightly clearer name would probably be
> string-bytes-per-character, I suppose.

Agreed, let's take this name.

>> > +SCM_INTERNAL char *scm_to_stringn (SCM str, size_t *lenp, 
>> > +                                   const char *encoding,
>> > +                                   enum iconv_ilseq_handler handler);
>> I suppose this would eventually become public.  What do you think?
>> Should we use a different type for HANDLER before that happens?
> The simplest thing would be to make some constants like
> scm_c_define ("STRING_ESCAPE", scm_from_int(iconveh_escape_sequence))
> Something similar is done in the scm_seek function's constants, such as

It's a C API so Scheme-level constants don't matter.

I was wondering whether using `enum iconv_ilseq_handler' in the public
API would be a good idea because that means that public headers include
either the system's or GNU libiconv's <iconv.h> (or some libunistring
header), in which case `guile.pc' must include the right `-I' flag, etc.
This may slightly complicate compilation of Guile apps.  Another
downside is that Guile's API would be bound to the values and semantics
of `iconv_ilseq_handler', and bound to iconv.

One possibility to avoid th would be to define our own type similar to


reply via email to

[Prev in Thread] Current Thread [Next in Thread]