Re: [PATCH] Wide characters

From: Mike Gran
Subject: Re: [PATCH] Wide characters
Date: Tue, 24 Feb 2009 23:39:26 -0800 (PST)



>Yes.  I think the best thing will be to let you experiment in a
>dedicated branch, so we can progressively see things take shape.

Works for me


>>  SCM_DEFINE1 (scm_char_ci_eq_p, "char-ci=?", scm_tc7_rpsubr,
>>              (SCM x, SCM y),
>>          "Return @code{#t} iff @var{x} is the same character as @var{y} 
>> ignoring\n"
>> -        "case, else @code{#f}.")
>> +        "case, else @code{#f}.  Case is computed in the Unicode locale.")

>The phrase "Unicode locale" looks confusing to me.  This function is
>locale-independent, right?

It is locale-independent.  I've seen the phrase "Unicode Locale" used
to mean that the uppercase and lowercase of letters are those
found in the Unicode Character Database.  They don't use any
language's special rules.  I could have written something like "the
case transforms are the default Unicode case transforms, and do not
use any language-specific rules."

>> +  {
>> +    /* C0 controls */
>> +    "nul", "soh", "stx", "etx", "eot", "enq", "ack", "bel",
>> +    "bs",  "ht",  "newline",  "vt",  "np",  "cr",  "so",  "si",
>> +    "dle", "dc1", "dc2", "dc3", "dc4", "nak", "syn", "etb",
>> +    "can", "em",  "sub", "esc", "fs",  "gs",  "rs",  "us",
>> +    "del",
>> +    /* C1 controls */
>> +    "bph", "nbh", "ind", "nel", "ssa", "esa",
>> +    "hts", "htj", "vts", "pld", "plu", "ri" , "ss2", "ss3",
>> +    "dcs", "pu1", "pu2", "sts", "cch", "mw" , "spa", "epa",
>> +    "sos", "sci", "csi", "st",  "osc", "pm",  "apc"
>> +  };
>Are the new names standard?

They are.  They are from the Unicode standard which descends from the
codes in ECMA-48/1991.  Actually a couple of the C0 control codes that
are currently in Guile differ from those standards. (I didn't change
them.)  The Unicode and ECMA-48 have "lf" for "newline" and "ff" for

>> -      /* Dirk:FIXME::  This type of character syntax is not R5RS
>> -      * compliant.  Further, it should be verified that the constant
>> -      * does only consist of octal digits.  Finally, it should be
>> -      * checked whether the resulting fixnum is in the range of
>> -      * characters.  */
>> +      /* FIXME:: This type of character syntax is not R5RS
>> +      * compliant.  */
>I think this comment remains valid, doesn't it?

In the code I sent, I did add checks for the two conditions Dirk

Anyway.  I'll keep playing with this as time permits.


