guile-commits
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Guile-commits] GNU Guile branch, master, updated. release_1-9-2-105-g02


From: Michael Gran
Subject: [Guile-commits] GNU Guile branch, master, updated. release_1-9-2-105-g026ed23
Date: Thu, 27 Aug 2009 15:08:39 +0000

This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "GNU Guile".

http://git.savannah.gnu.org/cgit/guile.git/commit/?id=026ed23911032ed8880af97d993315615b9f5b07

The branch, master has been updated
       via  026ed23911032ed8880af97d993315615b9f5b07 (commit)
       via  930ddd34c32b2cad49ffb254951e3cac50c1b341 (commit)
       via  f49dbcadf3829fe6ca2e4815c2af772360d454e8 (commit)
      from  3bcf189ba0d903e42935154c499ad60e1bf9cb3b (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
commit 026ed23911032ed8880af97d993315615b9f5b07
Author: Michael Gran <address@hidden>
Date:   Thu Aug 27 07:35:39 2009 -0700

    Always cast input to toupper as int
    
    * libguile/read.c (scm_scan_for_encoding): add cast to int

commit 930ddd34c32b2cad49ffb254951e3cac50c1b341
Author: Michael Gran <address@hidden>
Date:   Thu Aug 27 07:34:48 2009 -0700

    Segfault when writing non-Latin-1 characters under Latin-1 locale
    
    * libguile/print.c (iprin1): handle write of non-Latin-1 characters
      under the Latin-1 locale

commit f49dbcadf3829fe6ca2e4815c2af772360d454e8
Author: Michael Gran <address@hidden>
Date:   Thu Aug 27 07:32:50 2009 -0700

    Unicode-capable srfi-14 charsets
    
    * libguile/Makefile.am: distribute new files srfi-14.i.c and
      unidata_to_charset.pl
    
    * chars.c (scm_c_upcase, scm_c_downcase): use unicode-enable toupper
      and tolower
    
    * libguile/srfi-14.h (scm_t_char_range, scm_t_char_set): new structures
      to describe char-sets
      (scm_t_char_set_cursor): new structure to describe char-set-cursors
      (SCM_BITS_PER_LONG): removed
      (SCM_CHARSET_GET): calls function
      New declarations for scm_i_charset_get, scm_i_charset_set,
      scm_i_charset_unset, and scm_debug_char_set.
    
    * test-suite/tests/srfi-14.test: new tests
    
    * libguile/srfi-14.c (SCM_CHARSET_DATA): new macro
      (SCM_CHARSET_SET, SCM_CHARSET_UNSET): call function
      (BYTES_PER_CHARSET, LONGS_PER_CHARSET): removed
      (scm_i_charset_get, scm_i_charset_set, scm_i_charset_unset)
      (charsets_equal, charsets_leq, charsets_union)
      (charsets_intersection, charsets_complement, charsets_xor): new
      functions that are low-level charset operators
      (charset_print, charset_free): modified for new charset struct
      (charset_cursor_print, charset_cursor_free): new function
      (make_char_set, scm_char_set_p, scm_char_set_eq, scm_car_set_leq)
      (scm_char_set_hash, scm_char_set_cursor, scm_char_set_ref)
      (scm_char_set_cursor_next, scm_end_of_char_set_p, scm_char_set_fold)
      (scm_char_set_unfold, scm_char_set_unfold_x, scm_char_set_for_each)
      (scm_char_set_map, scm_char_set_copy, scm_char_set, scm_list_to_char_set)
      (scm_list_to_char_set_x, scm_string_to_char_set, scm_string_to_char_set_x)
      (scm_char_set_filter, scm_char_set_filter_x, scm_ucs_range_to_char_set)
      (scm_ucs_range_to_char_set_x, scm_to_char_set, scm_char_set_size)
      (scm_char_set_count, scm_char_set_to_list, scm_char_set_to_string)
      (scm_char_set_contains_p, scm_char_set_every, scm_char_set_any)
      (scm_char_set_adjoin, scm_char_set_delete, scm_char_set_adjoin_x)
      (scm_char_set_delete_x, scm_char_set_complement, scm_char_set_union)
      (scm_char_set_intersection, scm_char_set_difference, scm_char_set_xor)
      (scm_char_set_diff_plus_intersection, scm_char_set_complement_x)
      (scm_char_set_union_x, scm_char_set_intersection_x, 
scm_char_set_difference_x)
      (scm_char_set_xor_x, scm_char_set_diff_plus_intersection_x): modified
      to use new charset and charset-cursor data structures
      (CSET_BLANK_PRED, CSET_SYMBOL_PRED, CSET_PUNCT_PRED, CSET_LOWER_PRED)
      (CSET_UPPER_PRED, CSET_LETTER_PRED, CSET_DIGIT_PRED, CSET_WHITESPACE_PRED)
      (CSET_CONTROL_PRED, CSET_HEX_DIGIT_PRED, CSET_ASCII_PRED, 
CSET_LETTER_PRED)
      (CSET_LETTER_AND_DIGIT_PRED, CSET_PRINTING_PRED, CSET_TRUE_PRED)
      (CSET_FALSE_PRED): removed
      (scm_srfi_14_compute_char_sets): removed - too slow to iterate
      over all of unicode at startup
      (scm_debug_char_set) [SCM_CHARSET_DEBUG]: new function

-----------------------------------------------------------------------

Summary of changes:
 libguile/Makefile.am          |    4 +-
 libguile/chars.c              |   10 +-
 libguile/print.c              |   11 +-
 libguile/read.c               |    4 +-
 libguile/srfi-14.c            | 1531 ++++++++++++++++++++++++++---------------
 libguile/srfi-14.h            |   34 +-
 test-suite/tests/srfi-14.test |  297 +++++++--
 7 files changed, 1250 insertions(+), 641 deletions(-)

diff --git a/libguile/Makefile.am b/libguile/Makefile.am
index ab372b6..d4d1a54 100644
--- a/libguile/Makefile.am
+++ b/libguile/Makefile.am
@@ -446,7 +446,7 @@ install-exec-hook:
 ## working.
 noinst_HEADERS = conv-integer.i.c conv-uinteger.i.c            \
                  eval.i.c ieee-754.h                           \
-                 srfi-4.i.c                                    \
+                 srfi-4.i.c srfi-14.i.c                                \
                  quicksort.i.c                                  \
                  win32-uname.h win32-dirent.h win32-socket.h   \
                 private-gc.h private-options.h
@@ -597,7 +597,7 @@ EXTRA_DIST = ChangeLog-scm ChangeLog-threads                
\
     cpp_errno.c cpp_err_symbols.in cpp_err_symbols.c                   \
     cpp_sig_symbols.c cpp_sig_symbols.in cpp_cnvt.awk                  \
     c-tokenize.lex version.h.in                                                
\
-    scmconfig.h.top libgettext.h libguile.map
+    scmconfig.h.top libgettext.h unidata_to_charset.pl libguile.map
 #    $(DOT_DOC_FILES) $(EXTRA_DOT_DOC_FILES) \
 #    guile-procedures.txt guile.texi
 
diff --git a/libguile/chars.c b/libguile/chars.c
index 552a2d9..c7cb09c 100644
--- a/libguile/chars.c
+++ b/libguile/chars.c
@@ -296,20 +296,14 @@ TODO: change name  to scm_i_.. ? --hwn
 scm_t_wchar
 scm_c_upcase (scm_t_wchar c)
 {
-  if (c > 255)
-    return c;
-
-  return toupper ((int) c);
+  return uc_toupper ((int) c);
 }
 
 
 scm_t_wchar
 scm_c_downcase (scm_t_wchar c)
 {
-  if (c > 255)
-    return c;
-
-  return tolower ((int) c);
+  return uc_tolower ((int) c);
 }
 
 
diff --git a/libguile/print.c b/libguile/print.c
index 4d206eb..86d067b 100644
--- a/libguile/print.c
+++ b/libguile/print.c
@@ -470,10 +470,15 @@ iprin1 (SCM exp, SCM port, scm_print_state *pstate)
 
                   enc = scm_i_get_port_encoding (port);
                   wbuf[0] = i;
-                  if (enc == NULL && i <= 0xFF)
+                  if (enc == NULL)
                     {
-                      /* Character is graphic and Latin-1.  Print it  */
-                      scm_lfwrite_str (wstr, port);
+                      if (i <= 0xFF)
+                        /* Character is graphic and Latin-1.  Print it  */
+                        scm_lfwrite_str (wstr, port);
+                      else
+                        /* Character is graphic but unrepresentable in
+                           this port's encoding.  */
+                        scm_intprint (i, 8, port);
                     }
                   else
                     {
diff --git a/libguile/read.c b/libguile/read.c
index c36842a..d91c868 100644
--- a/libguile/read.c
+++ b/libguile/read.c
@@ -1446,8 +1446,8 @@ scm_scan_for_encoding (SCM port)
   encoding = scm_malloc (i+1);
   memcpy (encoding, pos, i);
   encoding[i] ='\0';
-  for (i = 0; i < strlen(encoding); i++)
-    encoding[i] = toupper(encoding[i]);
+  for (i = 0; i < strlen (encoding); i++)
+    encoding[i] = toupper ((int) encoding[i]);
 
   /* push backwards to make sure we were in a comment */
   in_comment = 0;
diff --git a/libguile/srfi-14.c b/libguile/srfi-14.c
index 3b4a5ff..fe56270 100644
--- a/libguile/srfi-14.c
+++ b/libguile/srfi-14.c
@@ -24,45 +24,476 @@
 
 
 #include <string.h>
-#include <ctype.h>
+#include <unictype.h>
 
 #include "libguile.h"
 #include "libguile/srfi-14.h"
+#include "libguile/strings.h"
 
+/* Include the pre-computed standard charset data.  */
+#include "libguile/srfi-14.i.c"
 
-#define SCM_CHARSET_SET(cs, idx)                               \
-  (((long *) SCM_SMOB_DATA (cs))[(idx) / SCM_BITS_PER_LONG] |= \
-    (1L << ((idx) % SCM_BITS_PER_LONG)))
+#define SCM_CHARSET_DATA(charset) ((scm_t_char_set *) SCM_SMOB_DATA (charset))
 
-#define SCM_CHARSET_UNSET(cs, idx)                             \
-  (((long *) SCM_SMOB_DATA (cs))[(idx) / SCM_BITS_PER_LONG] &= \
-    (~(1L << ((idx) % SCM_BITS_PER_LONG))))
-
-#define BYTES_PER_CHARSET (SCM_CHARSET_SIZE / 8)
-#define LONGS_PER_CHARSET (SCM_CHARSET_SIZE / SCM_BITS_PER_LONG)
+#define SCM_CHARSET_SET(cs, idx)                        \
+  scm_i_charset_set (SCM_CHARSET_DATA (cs), idx)
 
+#define SCM_CHARSET_UNSET(cs, idx)                      \
+  scm_i_charset_unset (SCM_CHARSET_DATA (cs), idx)
 
 /* Smob type code for character sets.  */
 int scm_tc16_charset = 0;
+int scm_tc16_charset_cursor = 0;
+
+/* True if N exists in charset CS.  */
+int
+scm_i_charset_get (scm_t_char_set *cs, scm_t_wchar n)
+{
+  size_t i;
+
+  i = 0;
+  while (i < cs->len)
+    {
+      if (cs->ranges[i].lo <= n && n <= cs->ranges[i].hi)
+        return 1;
+      i++;
+    }
+
+  return 0;
+}
+
+/* Put N into charset CS.  */
+void
+scm_i_charset_set (scm_t_char_set *cs, scm_t_wchar n)
+{
+  size_t i;
+  size_t len;
+
+  len = cs->len;
+
+  i = 0;
+  while (i < len)
+    {
+      /* Already in this range  */
+      if (cs->ranges[i].lo <= n && n <= cs->ranges[i].hi)
+        {
+          return;
+        }
+
+      if (n == cs->ranges[i].lo - 1)
+        {
+          /* This char is one below the current range. */
+          if (i > 0 && cs->ranges[i - 1].hi + 1 == n)
+            {
+              /* It is also one above the previous range, so combine them.  */
+              cs->ranges[i - 1].hi = cs->ranges[i].hi;
+              if (i < len - 1)
+                memmove (cs->ranges + i, cs->ranges + (i + 1),
+                         sizeof (scm_t_char_range) * (len - i - 1));
+              cs->ranges = scm_gc_realloc (cs->ranges,
+                                           sizeof (scm_t_char_range) * len,
+                                           sizeof (scm_t_char_range) * (len -
+                                                                        1),
+                                           "character-set");
+              cs->len = len - 1;
+              return;
+            }
+          else
+            {
+              /* Expand the range down by one.  */
+              cs->ranges[i].lo = n;
+              return;
+            }
+        }
+      else if (n == cs->ranges[i].hi + 1)
+        {
+          /* This char is one above the current range.  */
+          if (i < len - 1 && cs->ranges[i + 1].lo - 1 == n)
+            {
+              /* It is also one below the next range, so combine them.  */
+              cs->ranges[i].hi = cs->ranges[i + 1].hi;
+              if (i < len - 2)
+                memmove (cs->ranges + (i + 1), cs->ranges + (i + 2),
+                         sizeof (scm_t_char_range) * (len - i - 2));
+              cs->ranges = scm_gc_realloc (cs->ranges,
+                                           sizeof (scm_t_char_range) * len,
+                                           sizeof (scm_t_char_range) * (len -
+                                                                        1),
+                                           "character-set");
+              cs->len = len - 1;
+              return;
+            }
+          else
+            {
+              /* Expand the range up by one.  */
+              cs->ranges[i].hi = n;
+              return;
+            }
+        }
+      else if (n < cs->ranges[i].lo - 1)
+        {
+          /* This is a new range below the current one.  */
+          cs->ranges = scm_gc_realloc (cs->ranges,
+                                       sizeof (scm_t_char_range) * len,
+                                       sizeof (scm_t_char_range) * (len + 1),
+                                       "character-set");
+          memmove (cs->ranges + (i + 1), cs->ranges + i,
+                   sizeof (scm_t_char_range) * (len - i));
+          cs->ranges[i].lo = n;
+          cs->ranges[i].hi = n;
+          cs->len = len + 1;
+          return;
+        }
+
+      i++;
+    }
+
+  /* This is a new range above all previous ranges.  */
+  if (len == 0)
+    {
+      cs->ranges = scm_gc_malloc (sizeof (scm_t_char_range), "character-set");
+    }
+  else
+    {
+      cs->ranges = scm_gc_realloc (cs->ranges,
+                                   sizeof (scm_t_char_range) * len,
+                                   sizeof (scm_t_char_range) * (len + 1),
+                                   "character-set");
+    }
+  cs->ranges[len].lo = n;
+  cs->ranges[len].hi = n;
+  cs->len = len + 1;
+
+  return;
+}
+
+/* If N is in charset CS, remove it.  */
+void
+scm_i_charset_unset (scm_t_char_set *cs, scm_t_wchar n)
+{
+  size_t i;
+  size_t len;
+
+  len = cs->len;
+
+  i = 0;
+  while (i < len)
+    {
+      if (n < cs->ranges[i].lo)
+        /* Not in this set.  */
+        return;
+
+      if (n == cs->ranges[i].lo && n == cs->ranges[i].hi)
+        {
+          /* Remove this one-character range.  */
+          if (len == 1)
+            {
+              scm_gc_free (cs->ranges,
+                           sizeof (scm_t_char_range) * cs->len,
+                           "character-set");
+              cs->ranges = NULL;
+              cs->len = 0;
+              return;
+            }
+          else if (i < len - 1)
+            {
+              memmove (cs->ranges + i, cs->ranges + (i + 1),
+                       sizeof (scm_t_char_range) * (len - i - 1));
+              cs->ranges = scm_gc_realloc (cs->ranges,
+                                           sizeof (scm_t_char_range) * len,
+                                           sizeof (scm_t_char_range) * (len -
+                                                                        1),
+                                           "character-set");
+              cs->len = len - 1;
+              return;
+            }
+          else if (i == len - 1)
+            {
+              cs->ranges = scm_gc_realloc (cs->ranges,
+                                           sizeof (scm_t_char_range) * len,
+                                           sizeof (scm_t_char_range) * (len -
+                                                                        1),
+                                           "character-set");
+              cs->len = len - 1;
+              return;
+            }
+        }
+      else if (n == cs->ranges[i].lo)
+        {
+          /* Shrink this range from the left.  */
+          cs->ranges[i].lo = n + 1;
+          return;
+        }
+      else if (n == cs->ranges[i].hi)
+        {
+          /* Shrink this range from the right.  */
+          cs->ranges[i].hi = n - 1;
+          return;
+        }
+      else if (n > cs->ranges[i].lo && n < cs->ranges[i].hi)
+        {
+          /* Split this range into two pieces.  */
+          cs->ranges = scm_gc_realloc (cs->ranges,
+                                       sizeof (scm_t_char_range) * len,
+                                       sizeof (scm_t_char_range) * (len + 1),
+                                       "character-set");
+          if (i < len - 1)
+            memmove (cs->ranges + (i + 2), cs->ranges + (i + 1),
+                     sizeof (scm_t_char_range) * (len - i - 1));
+          cs->ranges[i + 1].hi = cs->ranges[i].hi;
+          cs->ranges[i + 1].lo = n + 1;
+          cs->ranges[i].hi = n - 1;
+          cs->len = len + 1;
+          return;
+        }
+
+      i++;
+    }
+
+  /* This value is above all ranges, so do nothing here.  */
+  return;
+}
+
+static int
+charsets_equal (scm_t_char_set *a, scm_t_char_set *b)
+{
+  if (a->len != b->len)
+    return 0;
+
+  if (memcmp (a->ranges, b->ranges, sizeof (scm_t_char_range) * a->len) != 0)
+    return 0;
+
+  return 1;
+}
+
+/* Return true if every character in A is also in B.  */
+static int
+charsets_leq (scm_t_char_set *a, scm_t_char_set *b)
+{
+  size_t i = 0, j = 0;
+  scm_t_wchar alo, ahi;
+
+  if (a->len == 0)
+    return 1;
+  if (b->len == 0)
+    return 0;
+  while (i < a->len)
+    {
+      alo = a->ranges[i].lo;
+      ahi = a->ranges[i].hi;
+      while (b->ranges[j].hi < alo)
+        {
+          if (j < b->len - 1)
+            j++;
+          else
+            return 0;
+        }
+      if (alo < b->ranges[j].lo || ahi > b->ranges[j].hi)
+        return 0;
+      i++;
+    }
+
+  return 1;
+}
+
+/* Merge B into A. */
+static void
+charsets_union (scm_t_char_set *a, scm_t_char_set *b)
+{
+  size_t i = 0;
+  scm_t_wchar blo, bhi, n;
+
+  if (b->len == 0)
+    return;
+
+  if (a->len == 0)
+    {
+      a->len = b->len;
+      a->ranges = scm_gc_malloc (sizeof (scm_t_char_range) * b->len,
+                                 "character-set");
+      memcpy (a->ranges, b->ranges, sizeof (scm_t_char_range) * b->len);
+      return;
+    }
+
+  /* This needs optimization.  */
+  while (i < b->len)
+    {
+      blo = b->ranges[i].lo;
+      bhi = b->ranges[i].hi;
+      for (n = blo; n <= bhi; n++)
+        scm_i_charset_set (a, n);
+
+      i++;
+    }
+
+  return;
+}
+
+/* Remove elements not both in A and B from A. */
+static void
+charsets_intersection (scm_t_char_set *a, scm_t_char_set *b)
+{
+  size_t i = 0;
+  scm_t_wchar blo, bhi, n;
+  scm_t_char_set *c;
+
+  if (a->len == 0)
+    return;
+
+  if (b->len == 0)
+    {
+      scm_gc_free (a->ranges, sizeof (scm_t_char_range) * a->len,
+                   "character-set");
+      a->len = 0;
+      return;
+    }
+
+  c = (scm_t_char_set *) scm_malloc (sizeof (scm_t_char_set));
+  c->len = 0;
+  c->ranges = NULL;
+
+  while (i < b->len)
+    {
+      blo = b->ranges[i].lo;
+      bhi = b->ranges[i].hi;
+      for (n = blo; n <= bhi; n++)
+        if (scm_i_charset_get (a, n))
+          scm_i_charset_set (c, n);
+      i++;
+    }
+  scm_gc_free (a->ranges, sizeof (scm_t_char_range) * a->len,
+               "character-set");
+
+  a->len = c->len;
+  if (c->len != 0)
+    a->ranges = c->ranges;
+  else
+    a->ranges = NULL;
+  free (c);
+  return;
+}
+
+/* Make P the compelement of Q.  */
+static void
+charsets_complement (scm_t_char_set *p, scm_t_char_set *q)
+{
+  int k, idx;
+
+  if (q->len == 0)
+    {
+      /* Fill with all valid codepoints.  */
+      p->len = 2;
+      p->ranges = scm_gc_malloc (sizeof (scm_t_char_range) * 2,
+                                 "character-set");
+      p->ranges[0].lo = 0;
+      p->ranges[0].hi = 0xd7ff;
+      p->ranges[1].lo = 0xe000;
+      p->ranges[1].hi = SCM_CODEPOINT_MAX;
+      return;
+    }
+
+  if (p->len > 0)
+    scm_gc_free (p->ranges, sizeof (scm_t_char_set) * p->len,
+                 "character-set");
+
+  p->len = 0;
+  if (q->ranges[0].lo > 0)
+    p->len++;
+  if (q->ranges[q->len - 1].hi < SCM_CODEPOINT_MAX)
+    p->len++;
+  p->len += q->len - 1;
+  p->ranges =
+    (scm_t_char_range *) scm_gc_malloc (sizeof (scm_t_char_range) * p->len,
+                                        "character-set");
+  idx = 0;
+  if (q->ranges[0].lo > 0)
+    {
+      p->ranges[idx].lo = 0;
+      p->ranges[idx++].hi = q->ranges[0].lo - 1;
+    }
+  for (k = 1; k < q->len; k++)
+    {
+      p->ranges[idx].lo = q->ranges[k - 1].hi + 1;
+      p->ranges[idx++].hi = q->ranges[k].lo - 1;
+    }
+  if (q->ranges[q->len - 1].hi < SCM_CODEPOINT_MAX)
+    {
+      p->ranges[idx].lo = q->ranges[q->len - 1].hi + 1;
+      p->ranges[idx].hi = SCM_CODEPOINT_MAX;
+    }
+  return;
+}
+
+/* Replace A with elements only found in one of A or B.  */
+static void
+charsets_xor (scm_t_char_set *a, scm_t_char_set *b)
+{
+  size_t i = 0;
+  scm_t_wchar blo, bhi, n;
+
+  if (b->len == 0)
+    {
+      return;
+    }
 
+  if (a->len == 0)
+    {
+      a->ranges =
+        (scm_t_char_range *) scm_gc_malloc (sizeof (scm_t_char_range) *
+                                            b->len, "character-set");
+      a->len = b->len;
+      memcpy (a->ranges, b->ranges, sizeof (scm_t_char_range) * a->len);
+      return;
+    }
+
+  while (i < b->len)
+    {
+      blo = b->ranges[i].lo;
+      bhi = b->ranges[i].hi;
+      for (n = blo; n <= bhi; n++)
+        {
+          if (scm_i_charset_get (a, n))
+            scm_i_charset_unset (a, n);
+          else
+            scm_i_charset_set (a, n);
+        }
+
+      i++;
+    }
+  return;
+}
 
 /* Smob print hook for character sets.  */
 static int
 charset_print (SCM charset, SCM port, scm_print_state *pstate SCM_UNUSED)
 {
-  int i;
+  size_t i;
   int first = 1;
+  scm_t_char_set *p;
+  const size_t max_ranges_to_print = 50;
+
+  p = SCM_CHARSET_DATA (charset);
 
   scm_puts ("#<charset {", port);
-  for (i = 0; i < SCM_CHARSET_SIZE; i++)
-    if (SCM_CHARSET_GET (charset, i))
-      {
-       if (first)
-         first = 0;
-       else
-         scm_puts (" ", port);
-       scm_write (SCM_MAKE_CHAR (i), port);
-      }
+  for (i = 0; i < p->len; i++)
+    {
+      if (first)
+        first = 0;
+      else
+        scm_puts (" ", port);
+      scm_write (SCM_MAKE_CHAR (p->ranges[i].lo), port);
+      if (p->ranges[i].lo != p->ranges[i].hi)
+        {
+          scm_puts ("..", port);
+          scm_write (SCM_MAKE_CHAR (p->ranges[i].hi), port);
+        }
+      if (i >= max_ranges_to_print)
+        {
+          /* Too many to print here.  Quit early.  */
+          scm_puts (" ...", port);
+          break;
+        }
+    }
   scm_puts ("}>", port);
   return 1;
 }
@@ -72,18 +503,71 @@ charset_print (SCM charset, SCM port, scm_print_state 
*pstate SCM_UNUSED)
 static size_t
 charset_free (SCM charset)
 {
-  return scm_smob_free (charset);
+  scm_t_char_set *cs;
+  size_t len = 0;
+
+  cs = SCM_CHARSET_DATA (charset);
+  if (cs != NULL)
+    len = cs->len;
+  if (len > 0)
+    scm_gc_free (cs->ranges, sizeof (scm_t_char_range) * len,
+                 "character-set");
+
+  cs->ranges = NULL;
+  cs->len = 0;
+
+  scm_gc_free (cs, sizeof (scm_t_char_set), "character-set");
+
+  scm_remember_upto_here_1 (charset);
+
+  return 0;
+}
+
+
+/* Smob print hook for character sets cursors.  */
+static int
+charset_cursor_print (SCM cursor, SCM port,
+                      scm_print_state *pstate SCM_UNUSED)
+{
+  scm_t_char_set_cursor *cur;
+
+  cur = (scm_t_char_set_cursor *) SCM_SMOB_DATA (cursor);
+
+  scm_puts ("#<charset-cursor ", port);
+  if (cur->range == (size_t) (-1))
+    scm_puts ("(empty)", port);
+  else
+    {
+      scm_write (scm_from_size_t (cur->range), port);
+      scm_puts (":", port);
+      scm_write (scm_from_int32 (cur->n), port);
+    }
+  scm_puts (">", port);
+  return 1;
+}
+
+/* Smob free hook for character sets. */
+static size_t
+charset_cursor_free (SCM charset)
+{
+  scm_t_char_set_cursor *cur;
+
+  cur = (scm_t_char_set_cursor *) SCM_SMOB_DATA (charset);
+  scm_gc_free (cur, sizeof (scm_t_char_set_cursor), "charset-cursor");
+  scm_remember_upto_here_1 (charset);
+
+  return 0;
 }
 
 
 /* Create a new, empty character set.  */
 static SCM
-make_char_set (const char * func_name)
+make_char_set (const char *func_name)
 {
-  long * p;
+  scm_t_char_set *p;
 
-  p = scm_gc_malloc (BYTES_PER_CHARSET, "character-set");
-  memset (p, 0, BYTES_PER_CHARSET);
+  p = scm_gc_malloc (sizeof (scm_t_char_set), "character-set");
+  memset (p, 0, sizeof (scm_t_char_set));
   SCM_RETURN_NEWSMOB (scm_tc16_charset, p);
 }
 
@@ -105,22 +589,22 @@ SCM_DEFINE (scm_char_set_eq, "char-set=", 0, 0, 1,
 #define FUNC_NAME s_scm_char_set_eq
 {
   int argnum = 1;
-  long *cs1_data = NULL;
+  scm_t_char_set *cs1_data = NULL;
 
   SCM_VALIDATE_REST_ARGUMENT (char_sets);
 
   while (!scm_is_null (char_sets))
     {
       SCM csi = SCM_CAR (char_sets);
-      long *csi_data;
+      scm_t_char_set *csi_data;
 
       SCM_VALIDATE_SMOB (argnum, csi, charset);
       argnum++;
-      csi_data = (long *) SCM_SMOB_DATA (csi);
+      csi_data = SCM_CHARSET_DATA (csi);
       if (cs1_data == NULL)
-       cs1_data = csi_data;
-      else if (memcmp (cs1_data, csi_data, BYTES_PER_CHARSET) != 0)
-       return SCM_BOOL_F;
+        cs1_data = csi_data;
+      else if (!charsets_equal (cs1_data, csi_data))
+        return SCM_BOOL_F;
       char_sets = SCM_CDR (char_sets);
     }
   return SCM_BOOL_T;
@@ -135,28 +619,23 @@ SCM_DEFINE (scm_char_set_leq, "char-set<=", 0, 0, 1,
 #define FUNC_NAME s_scm_char_set_leq
 {
   int argnum = 1;
-  long *prev_data = NULL;
+  scm_t_char_set *prev_data = NULL;
 
   SCM_VALIDATE_REST_ARGUMENT (char_sets);
 
   while (!scm_is_null (char_sets))
     {
       SCM csi = SCM_CAR (char_sets);
-      long *csi_data;
+      scm_t_char_set *csi_data;
 
       SCM_VALIDATE_SMOB (argnum, csi, charset);
       argnum++;
-      csi_data = (long *) SCM_SMOB_DATA (csi);
+      csi_data = SCM_CHARSET_DATA (csi);
       if (prev_data)
-       {
-         int k;
-
-         for (k = 0; k < LONGS_PER_CHARSET; k++)
-           {
-             if ((prev_data[k] & csi_data[k]) != prev_data[k])
-               return SCM_BOOL_F;
-           }
-       }
+        {
+          if (!charsets_leq (prev_data, csi_data))
+            return SCM_BOOL_F;
+        }
       prev_data = csi_data;
       char_sets = SCM_CDR (char_sets);
     }
@@ -174,9 +653,10 @@ SCM_DEFINE (scm_char_set_hash, "char-set-hash", 1, 1, 0,
 {
   const unsigned long default_bnd = 871;
   unsigned long bnd;
-  long * p;
+  scm_t_char_set *p;
   unsigned long val = 0;
   int k;
+  scm_t_wchar c;
 
   SCM_VALIDATE_SMOB (1, cs, charset);
 
@@ -186,14 +666,14 @@ SCM_DEFINE (scm_char_set_hash, "char-set-hash", 1, 1, 0,
     {
       bnd = scm_to_ulong (bound);
       if (bnd == 0)
-       bnd = default_bnd;
+        bnd = default_bnd;
     }
 
-  p = (long *) SCM_SMOB_DATA (cs);
-  for (k = 0; k < LONGS_PER_CHARSET; k++)
+  p = SCM_CHARSET_DATA (cs);
+  for (k = 0; k < p->len; k++)
     {
-      if (p[k] != 0)
-        val = p[k] + (val << 1);
+      for (c = p->ranges[k].lo; c <= p->ranges[k].hi; c++)
+        val = c + (val << 1);
     }
   return scm_from_ulong (val % bnd);
 }
@@ -201,89 +681,154 @@ SCM_DEFINE (scm_char_set_hash, "char-set-hash", 1, 1, 0,
 
 
 SCM_DEFINE (scm_char_set_cursor, "char-set-cursor", 1, 0, 0,
-           (SCM cs),
-           "Return a cursor into the character set @var{cs}.")
+            (SCM cs), "Return a cursor into the character set @var{cs}.")
 #define FUNC_NAME s_scm_char_set_cursor
 {
-  int idx;
+  scm_t_char_set *cs_data;
+  scm_t_char_set_cursor *cur_data;
 
   SCM_VALIDATE_SMOB (1, cs, charset);
-  for (idx = 0; idx < SCM_CHARSET_SIZE; idx++)
+  cs_data = SCM_CHARSET_DATA (cs);
+  cur_data =
+    (scm_t_char_set_cursor *) scm_gc_malloc (sizeof (scm_t_char_set_cursor),
+                                             "charset-cursor");
+  if (cs_data->len == 0)
     {
-      if (SCM_CHARSET_GET (cs, idx))
-       break;
+      cur_data->range = (size_t) (-1);
+      cur_data->n = 0;
     }
-  return SCM_I_MAKINUM (idx);
+  else
+    {
+      cur_data->range = 0;
+      cur_data->n = cs_data->ranges[0].lo;
+    }
+  SCM_RETURN_NEWSMOB (scm_tc16_charset_cursor, cur_data);
 }
+
 #undef FUNC_NAME
 
 
 SCM_DEFINE (scm_char_set_ref, "char-set-ref", 2, 0, 0,
-           (SCM cs, SCM cursor),
-           "Return the character at the current cursor position\n"
-           "@var{cursor} in the character set @var{cs}.  It is an error to\n"
-           "pass a cursor for which @code{end-of-char-set?} returns true.")
+            (SCM cs, SCM cursor),
+            "Return the character at the current cursor position\n"
+            "@var{cursor} in the character set @var{cs}.  It is an error to\n"
+            "pass a cursor for which @code{end-of-char-set?} returns true.")
 #define FUNC_NAME s_scm_char_set_ref
 {
-  size_t ccursor = scm_to_size_t (cursor);
+  scm_t_char_set *cs_data;
+  scm_t_char_set_cursor *cur_data;
+  size_t i;
+
   SCM_VALIDATE_SMOB (1, cs, charset);
+  SCM_VALIDATE_SMOB (2, cursor, charset_cursor);
+
+  cs_data = SCM_CHARSET_DATA (cs);
+  cur_data = (scm_t_char_set_cursor *) SCM_SMOB_DATA (cursor);
 
-  if (ccursor >= SCM_CHARSET_SIZE || !SCM_CHARSET_GET (cs, ccursor))
+  /* Validate that this cursor is still true.  */
+  i = cur_data->range;
+  if (i == (size_t) (-1)
+      || i >= cs_data->len
+      || cur_data->n < cs_data->ranges[i].lo
+      || cur_data->n > cs_data->ranges[i].hi)
     SCM_MISC_ERROR ("invalid character set cursor: ~A", scm_list_1 (cursor));
-  return SCM_MAKE_CHAR (ccursor);
+  return SCM_MAKE_CHAR (cur_data->n);
 }
+
 #undef FUNC_NAME
 
 
 SCM_DEFINE (scm_char_set_cursor_next, "char-set-cursor-next", 2, 0, 0,
-           (SCM cs, SCM cursor),
-           "Advance the character set cursor @var{cursor} to the next\n"
-           "character in the character set @var{cs}.  It is an error if the\n"
-           "cursor given satisfies @code{end-of-char-set?}.")
+            (SCM cs, SCM cursor),
+            "Advance the character set cursor @var{cursor} to the next\n"
+            "character in the character set @var{cs}.  It is an error if the\n"
+            "cursor given satisfies @code{end-of-char-set?}.")
 #define FUNC_NAME s_scm_char_set_cursor_next
 {
-  size_t ccursor = scm_to_size_t (cursor);
+  scm_t_char_set *cs_data;
+  scm_t_char_set_cursor *cur_data;
+  size_t i;
+
   SCM_VALIDATE_SMOB (1, cs, charset);
+  SCM_VALIDATE_SMOB (2, cursor, charset_cursor);
+
+  cs_data = SCM_CHARSET_DATA (cs);
+  cur_data = (scm_t_char_set_cursor *) SCM_SMOB_DATA (cursor);
 
-  if (ccursor >= SCM_CHARSET_SIZE || !SCM_CHARSET_GET (cs, ccursor))
+  /* Validate that this cursor is still true.  */
+  i = cur_data->range;
+  if (i == (size_t) (-1)
+      || i >= cs_data->len
+      || cur_data->n < cs_data->ranges[i].lo
+      || cur_data->n > cs_data->ranges[i].hi)
     SCM_MISC_ERROR ("invalid character set cursor: ~A", scm_list_1 (cursor));
-  for (ccursor++; ccursor < SCM_CHARSET_SIZE; ccursor++)
+  /* Increment the cursor.  */
+  if (cur_data->n == cs_data->ranges[i].hi)
     {
-      if (SCM_CHARSET_GET (cs, ccursor))
-       break;
+      if (i + 1 < cs_data->len)
+        {
+          cur_data->range = i + 1;
+          cur_data->n = cs_data->ranges[i + 1].lo;
+        }
+      else
+        {
+          /* This is the end of the road.  */
+          cur_data->range = (size_t) (-1);
+          cur_data->n = 0;
+        }
     }
-  return SCM_I_MAKINUM (ccursor);
+  else
+    {
+      cur_data->n = cur_data->n + 1;
+    }
+
+  return cursor;
 }
+
 #undef FUNC_NAME
 
 
 SCM_DEFINE (scm_end_of_char_set_p, "end-of-char-set?", 1, 0, 0,
-           (SCM cursor),
-           "Return @code{#t} if @var{cursor} has reached the end of a\n"
-           "character set, @code{#f} otherwise.")
+            (SCM cursor),
+            "Return @code{#t} if @var{cursor} has reached the end of a\n"
+            "character set, @code{#f} otherwise.")
 #define FUNC_NAME s_scm_end_of_char_set_p
 {
-  size_t ccursor = scm_to_size_t (cursor);
-  return scm_from_bool (ccursor >= SCM_CHARSET_SIZE);
+  scm_t_char_set_cursor *cur_data;
+  SCM_VALIDATE_SMOB (1, cursor, charset_cursor);
+
+  cur_data = (scm_t_char_set_cursor *) SCM_SMOB_DATA (cursor);
+  if (cur_data->range == (size_t) (-1))
+    return SCM_BOOL_T;
+
+  return SCM_BOOL_F;
 }
+
 #undef FUNC_NAME
 
 
 SCM_DEFINE (scm_char_set_fold, "char-set-fold", 3, 0, 0,
-           (SCM kons, SCM knil, SCM cs),
-           "Fold the procedure @var{kons} over the character set @var{cs},\n"
-           "initializing it with @var{knil}.")
+            (SCM kons, SCM knil, SCM cs),
+            "Fold the procedure @var{kons} over the character set @var{cs},\n"
+            "initializing it with @var{knil}.")
 #define FUNC_NAME s_scm_char_set_fold
 {
+  scm_t_char_set *cs_data;
   int k;
+  scm_t_wchar n;
 
   SCM_VALIDATE_PROC (1, kons);
   SCM_VALIDATE_SMOB (3, cs, charset);
 
-  for (k = 0; k < SCM_CHARSET_SIZE; k++)
-    if (SCM_CHARSET_GET (cs, k))
+  cs_data = SCM_CHARSET_DATA (cs);
+
+  if (cs_data->len == 0)
+    return knil;
+
+  for (k = 0; k < cs_data->len; k++)
+    for (n = cs_data->ranges[k].lo; n <= cs_data->ranges[k].hi; n++)
       {
-       knil = scm_call_2 (kons, SCM_MAKE_CHAR (k), knil);
+        knil = scm_call_2 (kons, SCM_MAKE_CHAR (n), knil);
       }
   return knil;
 }
@@ -373,21 +918,32 @@ SCM_DEFINE (scm_char_set_unfold_x, "char-set-unfold!", 5, 
0, 0,
 
 
 SCM_DEFINE (scm_char_set_for_each, "char-set-for-each", 2, 0, 0,
-           (SCM proc, SCM cs),
-           "Apply @var{proc} to every character in the character set\n"
-           "@var{cs}.  The return value is not specified.")
+            (SCM proc, SCM cs),
+            "Apply @var{proc} to every character in the character set\n"
+            "@var{cs}.  The return value is not specified.")
 #define FUNC_NAME s_scm_char_set_for_each
 {
+  scm_t_char_set *cs_data;
   int k;
+  scm_t_wchar n;
 
   SCM_VALIDATE_PROC (1, proc);
   SCM_VALIDATE_SMOB (2, cs, charset);
 
-  for (k = 0; k < SCM_CHARSET_SIZE; k++)
-    if (SCM_CHARSET_GET (cs, k))
-      scm_call_1 (proc, SCM_MAKE_CHAR (k));
+  cs_data = SCM_CHARSET_DATA (cs);
+
+  if (cs_data->len == 0)
+    return SCM_UNSPECIFIED;
+
+  for (k = 0; k < cs_data->len; k++)
+    for (n = cs_data->ranges[k].lo; n <= cs_data->ranges[k].hi; n++)
+      {
+        scm_call_1 (proc, SCM_MAKE_CHAR (n));
+      }
+
   return SCM_UNSPECIFIED;
 }
+
 #undef FUNC_NAME
 
 
@@ -399,18 +955,26 @@ SCM_DEFINE (scm_char_set_map, "char-set-map", 2, 0, 0,
 {
   SCM result;
   int k;
+  scm_t_char_set *cs_data;
+  scm_t_wchar n;
 
   SCM_VALIDATE_PROC (1, proc);
   SCM_VALIDATE_SMOB (2, cs, charset);
 
   result = make_char_set (FUNC_NAME);
-  for (k = 0; k < SCM_CHARSET_SIZE; k++)
-    if (SCM_CHARSET_GET (cs, k))
+  cs_data = SCM_CHARSET_DATA (cs);
+
+  if (cs_data->len == 0)
+    return result;
+
+  for (k = 0; k < cs_data->len; k++)
+    for (n = cs_data->ranges[k].lo; n <= cs_data->ranges[k].hi; n++)
       {
-       SCM ch = scm_call_1 (proc, SCM_MAKE_CHAR (k));
-       if (!SCM_CHARP (ch))
-         SCM_MISC_ERROR ("procedure ~S returned non-char", scm_list_1 (proc));
-       SCM_CHARSET_SET (result, SCM_CHAR (ch));
+        SCM ch = scm_call_1 (proc, SCM_MAKE_CHAR (n));
+        if (!SCM_CHARP (ch))
+          SCM_MISC_ERROR ("procedure ~S returned non-char",
+                          scm_list_1 (proc));
+        SCM_CHARSET_SET (result, SCM_CHAR (ch));
       }
   return result;
 }
@@ -424,17 +988,26 @@ SCM_DEFINE (scm_char_set_copy, "char-set-copy", 1, 0, 0,
 #define FUNC_NAME s_scm_char_set_copy
 {
   SCM ret;
-  long * p1, * p2;
-  int k;
+  scm_t_char_set *p1, *p2;
 
   SCM_VALIDATE_SMOB (1, cs, charset);
   ret = make_char_set (FUNC_NAME);
-  p1 = (long *) SCM_SMOB_DATA (cs);
-  p2 = (long *) SCM_SMOB_DATA (ret);
-  for (k = 0; k < LONGS_PER_CHARSET; k++)
-    p2[k] = p1[k];
+  p1 = SCM_CHARSET_DATA (cs);
+  p2 = SCM_CHARSET_DATA (ret);
+  p2->len = p1->len;
+
+  if (p1->len == 0)
+    p2->ranges = NULL;
+  else
+    {
+      p2->ranges = scm_gc_malloc (sizeof (scm_t_char_range) * p1->len,
+                                  "character-set");
+      memcpy (p2->ranges, p1->ranges, sizeof (scm_t_char_range) * p1->len);
+    }
+
   return ret;
 }
+
 #undef FUNC_NAME
 
 
@@ -444,20 +1017,18 @@ SCM_DEFINE (scm_char_set, "char-set", 0, 0, 1,
 #define FUNC_NAME s_scm_char_set
 {
   SCM cs;
-  long * p;
   int argnum = 1;
 
   SCM_VALIDATE_REST_ARGUMENT (rest);
   cs = make_char_set (FUNC_NAME);
-  p = (long *) SCM_SMOB_DATA (cs);
   while (!scm_is_null (rest))
     {
-      int c;
+      scm_t_wchar c;
 
       SCM_VALIDATE_CHAR_COPY (argnum, SCM_CAR (rest), c);
       argnum++;
       rest = SCM_CDR (rest);
-      p[c / SCM_BITS_PER_LONG] |= 1L << (c % SCM_BITS_PER_LONG);
+      SCM_CHARSET_SET (cs, c);
     }
   return cs;
 }
@@ -472,7 +1043,6 @@ SCM_DEFINE (scm_list_to_char_set, "list->char-set", 1, 1, 
0,
 #define FUNC_NAME s_scm_list_to_char_set
 {
   SCM cs;
-  long * p;
 
   SCM_VALIDATE_LIST (1, list);
   if (SCM_UNBNDP (base_cs))
@@ -482,16 +1052,16 @@ SCM_DEFINE (scm_list_to_char_set, "list->char-set", 1, 
1, 0,
       SCM_VALIDATE_SMOB (2, base_cs, charset);
       cs = scm_char_set_copy (base_cs);
     }
-  p = (long *) SCM_SMOB_DATA (cs);
   while (!scm_is_null (list))
     {
       SCM chr = SCM_CAR (list);
-      int c;
+      scm_t_wchar c;
 
       SCM_VALIDATE_CHAR_COPY (0, chr, c);
       list = SCM_CDR (list);
 
-      p[c / SCM_BITS_PER_LONG] |= 1L << (c % SCM_BITS_PER_LONG);
+
+      SCM_CHARSET_SET (cs, c);
     }
   return cs;
 }
@@ -499,26 +1069,23 @@ SCM_DEFINE (scm_list_to_char_set, "list->char-set", 1, 
1, 0,
 
 
 SCM_DEFINE (scm_list_to_char_set_x, "list->char-set!", 2, 0, 0,
-           (SCM list, SCM base_cs),
-           "Convert the character list @var{list} to a character set.  The\n"
-           "characters are added to @var{base_cs} and @var{base_cs} is\n"
-           "returned.")
+            (SCM list, SCM base_cs),
+            "Convert the character list @var{list} to a character set.  The\n"
+            "characters are added to @var{base_cs} and @var{base_cs} is\n"
+            "returned.")
 #define FUNC_NAME s_scm_list_to_char_set_x
 {
-  long * p;
-
   SCM_VALIDATE_LIST (1, list);
   SCM_VALIDATE_SMOB (2, base_cs, charset);
-  p = (long *) SCM_SMOB_DATA (base_cs);
   while (!scm_is_null (list))
     {
       SCM chr = SCM_CAR (list);
-      int c;
+      scm_t_wchar c;
 
       SCM_VALIDATE_CHAR_COPY (0, chr, c);
       list = SCM_CDR (list);
 
-      p[c / SCM_BITS_PER_LONG] |= 1L << (c % SCM_BITS_PER_LONG);
+      SCM_CHARSET_SET (base_cs, c);
     }
   return base_cs;
 }
@@ -533,8 +1100,6 @@ SCM_DEFINE (scm_string_to_char_set, "string->char-set", 1, 
1, 0,
 #define FUNC_NAME s_scm_string_to_char_set
 {
   SCM cs;
-  long * p;
-  const char * s;
   size_t k = 0, len;
 
   SCM_VALIDATE_STRING (1, str);
@@ -545,13 +1110,11 @@ SCM_DEFINE (scm_string_to_char_set, "string->char-set", 
1, 1, 0,
       SCM_VALIDATE_SMOB (2, base_cs, charset);
       cs = scm_char_set_copy (base_cs);
     }
-  p = (long *) SCM_SMOB_DATA (cs);
-  s = scm_i_string_chars (str);
   len = scm_i_string_length (str);
   while (k < len)
     {
-      int c = s[k++];
-      p[c / SCM_BITS_PER_LONG] |= 1L << (c % SCM_BITS_PER_LONG);
+      scm_t_wchar c = scm_i_string_ref (str, k++);
+      SCM_CHARSET_SET (cs, c);
     }
   scm_remember_upto_here_1 (str);
   return cs;
@@ -560,25 +1123,21 @@ SCM_DEFINE (scm_string_to_char_set, "string->char-set", 
1, 1, 0,
 
 
 SCM_DEFINE (scm_string_to_char_set_x, "string->char-set!", 2, 0, 0,
-           (SCM str, SCM base_cs),
-           "Convert the string @var{str} to a character set.  The\n"
-           "characters from the string are added to @var{base_cs}, and\n"
-           "@var{base_cs} is returned.")
+            (SCM str, SCM base_cs),
+            "Convert the string @var{str} to a character set.  The\n"
+            "characters from the string are added to @var{base_cs}, and\n"
+            "@var{base_cs} is returned.")
 #define FUNC_NAME s_scm_string_to_char_set_x
 {
-  long * p;
-  const char * s;
   size_t k = 0, len;
 
   SCM_VALIDATE_STRING (1, str);
   SCM_VALIDATE_SMOB (2, base_cs, charset);
-  p = (long *) SCM_SMOB_DATA (base_cs);
-  s = scm_i_string_chars (str);
   len = scm_i_string_length (str);
   while (k < len)
     {
-      int c = s[k++];
-      p[c / SCM_BITS_PER_LONG] |= 1L << (c % SCM_BITS_PER_LONG);
+      scm_t_wchar c = scm_i_string_ref (str, k++);
+      SCM_CHARSET_SET (base_cs, c);
     }
   scm_remember_upto_here_1 (str);
   return base_cs;
@@ -595,7 +1154,8 @@ SCM_DEFINE (scm_char_set_filter, "char-set-filter", 2, 1, 
0,
 {
   SCM ret;
   int k;
-  long * p;
+  scm_t_wchar n;
+  scm_t_char_set *p;
 
   SCM_VALIDATE_PROC (1, pred);
   SCM_VALIDATE_SMOB (2, cs, charset);
@@ -606,19 +1166,23 @@ SCM_DEFINE (scm_char_set_filter, "char-set-filter", 2, 
1, 0,
     }
   else
     ret = make_char_set (FUNC_NAME);
-  p = (long *) SCM_SMOB_DATA (ret);
-  for (k = 0; k < SCM_CHARSET_SIZE; k++)
-    {
-      if (SCM_CHARSET_GET (cs, k))
-       {
-         SCM res = scm_call_1 (pred, SCM_MAKE_CHAR (k));
 
-         if (scm_is_true (res))
-           p[k / SCM_BITS_PER_LONG] |= 1L << (k % SCM_BITS_PER_LONG);
-       }
-    }
+  p = SCM_CHARSET_DATA (cs);
+
+  if (p->len == 0)
+    return ret;
+
+  for (k = 0; k < p->len; k++)
+    for (n = p->ranges[k].lo; n <= p->ranges[k].hi; n++)
+      {
+        SCM res = scm_call_1 (pred, SCM_MAKE_CHAR (n));
+
+        if (scm_is_true (res))
+          SCM_CHARSET_SET (ret, n);
+      }
   return ret;
 }
+
 #undef FUNC_NAME
 
 
@@ -630,24 +1194,27 @@ SCM_DEFINE (scm_char_set_filter_x, "char-set-filter!", 
3, 0, 0,
 #define FUNC_NAME s_scm_char_set_filter_x
 {
   int k;
-  long * p;
+  scm_t_wchar n;
+  scm_t_char_set *p;
 
   SCM_VALIDATE_PROC (1, pred);
   SCM_VALIDATE_SMOB (2, cs, charset);
   SCM_VALIDATE_SMOB (3, base_cs, charset);
-  p = (long *) SCM_SMOB_DATA (base_cs);
-  for (k = 0; k < SCM_CHARSET_SIZE; k++)
-    {
-      if (SCM_CHARSET_GET (cs, k))
-       {
-         SCM res = scm_call_1 (pred, SCM_MAKE_CHAR (k));
+  p = SCM_CHARSET_DATA (cs);
+  if (p->len == 0)
+    return base_cs;
 
-         if (scm_is_true (res))
-           p[k / SCM_BITS_PER_LONG] |= 1L << (k % SCM_BITS_PER_LONG);
-       }
-    }
+  for (k = 0; k < p->len; k++)
+    for (n = p->ranges[k].lo; n <= p->ranges[k].hi; n++)
+      {
+        SCM res = scm_call_1 (pred, SCM_MAKE_CHAR (k));
+
+        if (scm_is_true (res))
+          SCM_CHARSET_SET (base_cs, n);
+      }
   return base_cs;
 }
+
 #undef FUNC_NAME
 
 
@@ -669,7 +1236,6 @@ SCM_DEFINE (scm_ucs_range_to_char_set, 
"ucs-range->char-set", 2, 2, 0,
 {
   SCM cs;
   size_t clower, cupper;
-  long * p;
 
   clower = scm_to_size_t (lower);
   cupper = scm_to_size_t (upper);
@@ -677,15 +1243,15 @@ SCM_DEFINE (scm_ucs_range_to_char_set, 
"ucs-range->char-set", 2, 2, 0,
   if (!SCM_UNBNDP (error))
     {
       if (scm_is_true (error))
-       {
-         SCM_ASSERT_RANGE (1, lower, clower <= SCM_CHARSET_SIZE);
-         SCM_ASSERT_RANGE (2, upper, cupper <= SCM_CHARSET_SIZE);
-       }
+        {
+          SCM_ASSERT_RANGE (1, lower, SCM_IS_UNICODE_CHAR (clower));
+          SCM_ASSERT_RANGE (2, upper, SCM_IS_UNICODE_CHAR (cupper));
+        }
     }
-  if (clower > SCM_CHARSET_SIZE)
-    clower = SCM_CHARSET_SIZE;
-  if (cupper > SCM_CHARSET_SIZE)
-    cupper = SCM_CHARSET_SIZE;
+  if (clower > 0x10FFFF)
+    clower = 0x10FFFF;
+  if (cupper > 0x10FFFF)
+    cupper = 0x10FFFF;
   if (SCM_UNBNDP (base_cs))
     cs = make_char_set (FUNC_NAME);
   else
@@ -693,10 +1259,11 @@ SCM_DEFINE (scm_ucs_range_to_char_set, 
"ucs-range->char-set", 2, 2, 0,
       SCM_VALIDATE_SMOB (4, base_cs, charset);
       cs = scm_char_set_copy (base_cs);
     }
-  p = (long *) SCM_SMOB_DATA (cs);
+  /* It not be difficult to write a more optimized version of the
+     following.  */
   while (clower < cupper)
     {
-      p[clower / SCM_BITS_PER_LONG] |= 1L << (clower % SCM_BITS_PER_LONG);
+      SCM_CHARSET_SET (cs, clower);
       clower++;
     }
   return cs;
@@ -721,24 +1288,24 @@ SCM_DEFINE (scm_ucs_range_to_char_set_x, 
"ucs-range->char-set!", 4, 0, 0,
 #define FUNC_NAME s_scm_ucs_range_to_char_set_x
 {
   size_t clower, cupper;
-  long * p;
 
   clower = scm_to_size_t (lower);
   cupper = scm_to_size_t (upper);
   SCM_ASSERT_RANGE (2, upper, cupper >= clower);
   if (scm_is_true (error))
     {
-      SCM_ASSERT_RANGE (1, lower, clower <= SCM_CHARSET_SIZE);
-      SCM_ASSERT_RANGE (2, upper, cupper <= SCM_CHARSET_SIZE);
+      SCM_ASSERT_RANGE (1, lower, SCM_IS_UNICODE_CHAR (clower));
+      SCM_ASSERT_RANGE (2, upper, SCM_IS_UNICODE_CHAR (cupper));
     }
-  if (clower > SCM_CHARSET_SIZE)
-    clower = SCM_CHARSET_SIZE;
-  if (cupper > SCM_CHARSET_SIZE)
-    cupper = SCM_CHARSET_SIZE;
-  p = (long *) SCM_SMOB_DATA (base_cs);
+  if (clower > SCM_CODEPOINT_MAX)
+    clower = SCM_CODEPOINT_MAX;
+  if (cupper > SCM_CODEPOINT_MAX)
+    cupper = SCM_CODEPOINT_MAX;
+
   while (clower < cupper)
     {
-      p[clower / SCM_BITS_PER_LONG] |= 1L << (clower % SCM_BITS_PER_LONG);
+      if (SCM_IS_UNICODE_CHAR (clower))
+        SCM_CHARSET_SET (base_cs, clower);
       clower++;
     }
   return base_cs;
@@ -767,13 +1334,20 @@ SCM_DEFINE (scm_char_set_size, "char-set-size", 1, 0, 0,
 #define FUNC_NAME s_scm_char_set_size
 {
   int k, count = 0;
+  scm_t_char_set *cs_data;
 
   SCM_VALIDATE_SMOB (1, cs, charset);
-  for (k = 0; k < SCM_CHARSET_SIZE; k++)
-    if (SCM_CHARSET_GET (cs, k))
-      count++;
-  return SCM_I_MAKINUM (count);
+  cs_data = SCM_CHARSET_DATA (cs);
+
+  if (cs_data->len == 0)
+    return scm_from_int (0);
+
+  for (k = 0; k < cs_data->len; k++)
+    count += cs_data->ranges[k].hi - cs_data->ranges[k].lo + 1;
+
+  return scm_from_int (count);
 }
+
 #undef FUNC_NAME
 
 
@@ -784,16 +1358,21 @@ SCM_DEFINE (scm_char_set_count, "char-set-count", 2, 0, 
0,
 #define FUNC_NAME s_scm_char_set_count
 {
   int k, count = 0;
+  scm_t_wchar n;
+  scm_t_char_set *cs_data;
 
   SCM_VALIDATE_PROC (1, pred);
   SCM_VALIDATE_SMOB (2, cs, charset);
+  cs_data = SCM_CHARSET_DATA (cs);
+  if (cs_data->len == 0)
+    return scm_from_int (0);
 
-  for (k = 0; k < SCM_CHARSET_SIZE; k++)
-    if (SCM_CHARSET_GET (cs, k))
+  for (k = 0; k < cs_data->len; k++)
+    for (n = cs_data->ranges[k].lo; n <= cs_data->ranges[k].hi; n++)
       {
-       SCM res = scm_call_1 (pred, SCM_MAKE_CHAR (k));
-       if (scm_is_true (res))
-         count++;
+        SCM res = scm_call_1 (pred, SCM_MAKE_CHAR (n));
+        if (scm_is_true (res))
+          count++;
       }
   return SCM_I_MAKINUM (count);
 }
@@ -807,14 +1386,21 @@ SCM_DEFINE (scm_char_set_to_list, "char-set->list", 1, 
0, 0,
 #define FUNC_NAME s_scm_char_set_to_list
 {
   int k;
+  scm_t_wchar n;
   SCM result = SCM_EOL;
+  scm_t_char_set *p;
 
   SCM_VALIDATE_SMOB (1, cs, charset);
-  for (k = SCM_CHARSET_SIZE; k > 0; k--)
-    if (SCM_CHARSET_GET (cs, k - 1))
-      result = scm_cons (SCM_MAKE_CHAR (k - 1), result);
+  p = SCM_CHARSET_DATA (cs);
+  if (p->len == 0)
+    return SCM_EOL;
+
+  for (k = p->len - 1; k >= 0; k--)
+    for (n = p->ranges[k].hi; n >= p->ranges[k].lo; n--)
+      result = scm_cons (SCM_MAKE_CHAR (n), result);
   return result;
 }
+
 #undef FUNC_NAME
 
 
@@ -828,19 +1414,38 @@ SCM_DEFINE (scm_char_set_to_string, "char-set->string", 
1, 0, 0,
   int k;
   int count = 0;
   int idx = 0;
+  int wide = 0;
   SCM result;
-  char * p;
+  scm_t_wchar n;
+  scm_t_char_set *cs_data;
+  char *buf;
+  scm_t_wchar *wbuf;
 
   SCM_VALIDATE_SMOB (1, cs, charset);
-  for (k = 0; k < SCM_CHARSET_SIZE; k++)
-    if (SCM_CHARSET_GET (cs, k))
-      count++;
-  result = scm_i_make_string (count, &p);
-  for (k = 0; k < SCM_CHARSET_SIZE; k++)
-    if (SCM_CHARSET_GET (cs, k))
-      p[idx++] = k;
+  cs_data = SCM_CHARSET_DATA (cs);
+  if (cs_data->len == 0)
+    return scm_nullstr;
+
+  if (cs_data->ranges[cs_data->len - 1].hi > 255)
+    wide = 1;
+
+  count = scm_to_int (scm_char_set_size (cs));
+  if (wide)
+    result = scm_i_make_wide_string (count, &wbuf);
+  else
+    result = scm_i_make_string (count, &buf);
+
+  for (k = 0; k < cs_data->len; k++)
+    for (n = cs_data->ranges[k].lo; n <= cs_data->ranges[k].hi; n++)
+      {
+        if (wide)
+          wbuf[idx++] = n;
+        else
+          buf[idx++] = n;
+      }
   return result;
 }
+
 #undef FUNC_NAME
 
 
@@ -864,20 +1469,27 @@ SCM_DEFINE (scm_char_set_every, "char-set-every", 2, 0, 
0,
 #define FUNC_NAME s_scm_char_set_every
 {
   int k;
+  scm_t_wchar n;
   SCM res = SCM_BOOL_T;
+  scm_t_char_set *cs_data;
 
   SCM_VALIDATE_PROC (1, pred);
   SCM_VALIDATE_SMOB (2, cs, charset);
 
-  for (k = 0; k < SCM_CHARSET_SIZE; k++)
-    if (SCM_CHARSET_GET (cs, k))
+  cs_data = SCM_CHARSET_DATA (cs);
+  if (cs_data->len == 0)
+    return SCM_BOOL_T;
+
+  for (k = 0; k < cs_data->len; k++)
+    for (n = cs_data->ranges[k].lo; n <= cs_data->ranges[k].hi; n++)
       {
-       res = scm_call_1 (pred, SCM_MAKE_CHAR (k));
-       if (scm_is_false (res))
-         return res;
+        res = scm_call_1 (pred, SCM_MAKE_CHAR (n));
+        if (scm_is_false (res))
+          return res;
       }
-  return res;
+  return SCM_BOOL_T;
 }
+
 #undef FUNC_NAME
 
 
@@ -888,16 +1500,20 @@ SCM_DEFINE (scm_char_set_any, "char-set-any", 2, 0, 0,
 #define FUNC_NAME s_scm_char_set_any
 {
   int k;
+  scm_t_wchar n;
+  scm_t_char_set *cs_data;
 
   SCM_VALIDATE_PROC (1, pred);
   SCM_VALIDATE_SMOB (2, cs, charset);
 
-  for (k = 0; k < SCM_CHARSET_SIZE; k++)
-    if (SCM_CHARSET_GET (cs, k))
+  cs_data = (scm_t_char_set *) cs;
+
+  for (k = 0; k < cs_data->len; k++)
+    for (n = cs_data->ranges[k].lo; n <= cs_data->ranges[k].hi; n++)
       {
-       SCM res = scm_call_1 (pred, SCM_MAKE_CHAR (k));
-       if (scm_is_true (res))
-         return res;
+        SCM res = scm_call_1 (pred, SCM_MAKE_CHAR (n));
+        if (scm_is_true (res))
+          return res;
       }
   return SCM_BOOL_F;
 }
@@ -905,27 +1521,24 @@ SCM_DEFINE (scm_char_set_any, "char-set-any", 2, 0, 0,
 
 
 SCM_DEFINE (scm_char_set_adjoin, "char-set-adjoin", 1, 0, 1,
-           (SCM cs, SCM rest),
-           "Add all character arguments to the first argument, which must\n"
-           "be a character set.")
+            (SCM cs, SCM rest),
+            "Add all character arguments to the first argument, which must\n"
+            "be a character set.")
 #define FUNC_NAME s_scm_char_set_adjoin
 {
-  long * p;
-
   SCM_VALIDATE_SMOB (1, cs, charset);
   SCM_VALIDATE_REST_ARGUMENT (rest);
   cs = scm_char_set_copy (cs);
 
-  p = (long *) SCM_SMOB_DATA (cs);
   while (!scm_is_null (rest))
     {
       SCM chr = SCM_CAR (rest);
-      int c;
+      scm_t_wchar c;
 
       SCM_VALIDATE_CHAR_COPY (1, chr, c);
       rest = SCM_CDR (rest);
 
-      p[c / SCM_BITS_PER_LONG] |= 1L << (c % SCM_BITS_PER_LONG);
+      SCM_CHARSET_SET (cs, c);
     }
   return cs;
 }
@@ -933,27 +1546,24 @@ SCM_DEFINE (scm_char_set_adjoin, "char-set-adjoin", 1, 
0, 1,
 
 
 SCM_DEFINE (scm_char_set_delete, "char-set-delete", 1, 0, 1,
-           (SCM cs, SCM rest),
-           "Delete all character arguments from the first argument, which\n"
-           "must be a character set.")
+            (SCM cs, SCM rest),
+            "Delete all character arguments from the first argument, which\n"
+            "must be a character set.")
 #define FUNC_NAME s_scm_char_set_delete
 {
-  long * p;
-
   SCM_VALIDATE_SMOB (1, cs, charset);
   SCM_VALIDATE_REST_ARGUMENT (rest);
   cs = scm_char_set_copy (cs);
 
-  p = (long *) SCM_SMOB_DATA (cs);
   while (!scm_is_null (rest))
     {
       SCM chr = SCM_CAR (rest);
-      int c;
+      scm_t_wchar c;
 
       SCM_VALIDATE_CHAR_COPY (1, chr, c);
       rest = SCM_CDR (rest);
 
-      p[c / SCM_BITS_PER_LONG] &= ~(1L << (c % SCM_BITS_PER_LONG));
+      SCM_CHARSET_UNSET (cs, c);
     }
   return cs;
 }
@@ -961,26 +1571,23 @@ SCM_DEFINE (scm_char_set_delete, "char-set-delete", 1, 
0, 1,
 
 
 SCM_DEFINE (scm_char_set_adjoin_x, "char-set-adjoin!", 1, 0, 1,
-           (SCM cs, SCM rest),
-           "Add all character arguments to the first argument, which must\n"
-           "be a character set.")
+            (SCM cs, SCM rest),
+            "Add all character arguments to the first argument, which must\n"
+            "be a character set.")
 #define FUNC_NAME s_scm_char_set_adjoin_x
 {
-  long * p;
-
   SCM_VALIDATE_SMOB (1, cs, charset);
   SCM_VALIDATE_REST_ARGUMENT (rest);
 
-  p = (long *) SCM_SMOB_DATA (cs);
   while (!scm_is_null (rest))
     {
       SCM chr = SCM_CAR (rest);
-      int c;
+      scm_t_wchar c;
 
       SCM_VALIDATE_CHAR_COPY (1, chr, c);
       rest = SCM_CDR (rest);
 
-      p[c / SCM_BITS_PER_LONG] |= 1L << (c % SCM_BITS_PER_LONG);
+      SCM_CHARSET_SET (cs, c);
     }
   return cs;
 }
@@ -988,26 +1595,23 @@ SCM_DEFINE (scm_char_set_adjoin_x, "char-set-adjoin!", 
1, 0, 1,
 
 
 SCM_DEFINE (scm_char_set_delete_x, "char-set-delete!", 1, 0, 1,
-           (SCM cs, SCM rest),
-           "Delete all character arguments from the first argument, which\n"
-           "must be a character set.")
+            (SCM cs, SCM rest),
+            "Delete all character arguments from the first argument, which\n"
+            "must be a character set.")
 #define FUNC_NAME s_scm_char_set_delete_x
 {
-  long * p;
-
   SCM_VALIDATE_SMOB (1, cs, charset);
   SCM_VALIDATE_REST_ARGUMENT (rest);
 
-  p = (long *) SCM_SMOB_DATA (cs);
   while (!scm_is_null (rest))
     {
       SCM chr = SCM_CAR (rest);
-      int c;
+      scm_t_wchar c;
 
       SCM_VALIDATE_CHAR_COPY (1, chr, c);
       rest = SCM_CDR (rest);
 
-      p[c / SCM_BITS_PER_LONG] &= ~(1L << (c % SCM_BITS_PER_LONG));
+      SCM_CHARSET_UNSET (cs, c);
     }
   return cs;
 }
@@ -1015,23 +1619,22 @@ SCM_DEFINE (scm_char_set_delete_x, "char-set-delete!", 
1, 0, 1,
 
 
 SCM_DEFINE (scm_char_set_complement, "char-set-complement", 1, 0, 0,
-           (SCM cs),
-           "Return the complement of the character set @var{cs}.")
+            (SCM cs), "Return the complement of the character set @var{cs}.")
 #define FUNC_NAME s_scm_char_set_complement
 {
-  int k;
   SCM res;
-  long * p, * q;
+  scm_t_char_set *p, *q;
 
   SCM_VALIDATE_SMOB (1, cs, charset);
 
   res = make_char_set (FUNC_NAME);
-  p = (long *) SCM_SMOB_DATA (res);
-  q = (long *) SCM_SMOB_DATA (cs);
-  for (k = 0; k < LONGS_PER_CHARSET; k++)
-    p[k] = ~q[k];
+  p = SCM_CHARSET_DATA (res);
+  q = SCM_CHARSET_DATA (cs);
+
+  charsets_complement (p, q);
   return res;
 }
+
 #undef FUNC_NAME
 
 
@@ -1042,22 +1645,21 @@ SCM_DEFINE (scm_char_set_union, "char-set-union", 0, 0, 
1,
 {
   int c = 1;
   SCM res;
-  long * p;
+  scm_t_char_set *p;
 
   SCM_VALIDATE_REST_ARGUMENT (rest);
 
   res = make_char_set (FUNC_NAME);
-  p = (long *) SCM_SMOB_DATA (res);
+  p = SCM_CHARSET_DATA (res);
   while (!scm_is_null (rest))
     {
-      int k;
       SCM cs = SCM_CAR (rest);
       SCM_VALIDATE_SMOB (c, cs, charset);
       c++;
       rest = SCM_CDR (rest);
 
-      for (k = 0; k < LONGS_PER_CHARSET; k++)
-       p[k] |= ((long *) SCM_SMOB_DATA (cs))[k];
+
+      charsets_union (p, (scm_t_char_set *) SCM_SMOB_DATA (cs));
     }
   return res;
 }
@@ -1077,26 +1679,24 @@ SCM_DEFINE (scm_char_set_intersection, 
"char-set-intersection", 0, 0, 1,
     res = make_char_set (FUNC_NAME);
   else
     {
-      long *p;
+      scm_t_char_set *p;
       int argnum = 2;
 
       res = scm_char_set_copy (SCM_CAR (rest));
-      p = (long *) SCM_SMOB_DATA (res);
+      p = SCM_CHARSET_DATA (res);
       rest = SCM_CDR (rest);
 
       while (scm_is_pair (rest))
-       {
-         int k;
-         SCM cs = SCM_CAR (rest);
-         long *cs_data;
-
-         SCM_VALIDATE_SMOB (argnum, cs, charset);
-         argnum++;
-         cs_data = (long *) SCM_SMOB_DATA (cs);
-         rest = SCM_CDR (rest);
-         for (k = 0; k < LONGS_PER_CHARSET; k++)
-           p[k] &= cs_data[k];
-       }
+        {
+          SCM cs = SCM_CAR (rest);
+          scm_t_char_set *cs_data;
+
+          SCM_VALIDATE_SMOB (argnum, cs, charset);
+          argnum++;
+          cs_data = SCM_CHARSET_DATA (cs);
+          rest = SCM_CDR (rest);
+          charsets_intersection (p, cs_data);
+        }
     }
 
   return res;
@@ -1110,24 +1710,25 @@ SCM_DEFINE (scm_char_set_difference, 
"char-set-difference", 1, 0, 1,
 #define FUNC_NAME s_scm_char_set_difference
 {
   int c = 2;
-  SCM res;
-  long * p;
+  SCM res, compl;
+  scm_t_char_set *p, *q;
 
   SCM_VALIDATE_SMOB (1, cs1, charset);
   SCM_VALIDATE_REST_ARGUMENT (rest);
 
   res = scm_char_set_copy (cs1);
-  p = (long *) SCM_SMOB_DATA (res);
+  p = SCM_CHARSET_DATA (res);
+  compl = make_char_set (FUNC_NAME);
+  q = SCM_CHARSET_DATA (compl);
   while (!scm_is_null (rest))
     {
-      int k;
       SCM cs = SCM_CAR (rest);
       SCM_VALIDATE_SMOB (c, cs, charset);
       c++;
       rest = SCM_CDR (rest);
 
-      for (k = 0; k < LONGS_PER_CHARSET; k++)
-       p[k] &= ~((long *) SCM_SMOB_DATA (cs))[k];
+      charsets_complement (q, SCM_CHARSET_DATA (cs));
+      charsets_intersection (p, q);
     }
   return res;
 }
@@ -1148,26 +1749,24 @@ SCM_DEFINE (scm_char_set_xor, "char-set-xor", 0, 0, 1,
   else
     {
       int argnum = 2;
-      long * p;
+      scm_t_char_set *p;
 
       res = scm_char_set_copy (SCM_CAR (rest));
-      p = (long *) SCM_SMOB_DATA (res);
+      p = SCM_CHARSET_DATA (res);
       rest = SCM_CDR (rest);
 
       while (scm_is_pair (rest))
-       {
-         SCM cs = SCM_CAR (rest);
-         long *cs_data;
-         int k;
-
-         SCM_VALIDATE_SMOB (argnum, cs, charset);
-         argnum++;
-         cs_data = (long *) SCM_SMOB_DATA (cs);
-         rest = SCM_CDR (rest);
-
-         for (k = 0; k < LONGS_PER_CHARSET; k++)
-           p[k] ^= cs_data[k];
-       }
+        {
+          SCM cs = SCM_CAR (rest);
+          scm_t_char_set *cs_data;
+
+          SCM_VALIDATE_SMOB (argnum, cs, charset);
+          argnum++;
+          cs_data = SCM_CHARSET_DATA (cs);
+          rest = SCM_CDR (rest);
+
+          charsets_xor (p, cs_data);
+        }
     }
   return res;
 }
@@ -1182,30 +1781,26 @@ SCM_DEFINE (scm_char_set_diff_plus_intersection, 
"char-set-diff+intersection", 1
 {
   int c = 2;
   SCM res1, res2;
-  long * p, * q;
+  scm_t_char_set *p, *q;
 
   SCM_VALIDATE_SMOB (1, cs1, charset);
   SCM_VALIDATE_REST_ARGUMENT (rest);
 
   res1 = scm_char_set_copy (cs1);
   res2 = make_char_set (FUNC_NAME);
-  p = (long *) SCM_SMOB_DATA (res1);
-  q = (long *) SCM_SMOB_DATA (res2);
+  p = SCM_CHARSET_DATA (res1);
+  q = SCM_CHARSET_DATA (res2);
   while (!scm_is_null (rest))
     {
-      int k;
       SCM cs = SCM_CAR (rest);
-      long *r;
+      scm_t_char_set *r;
 
       SCM_VALIDATE_SMOB (c, cs, charset);
       c++;
-      r = (long *) SCM_SMOB_DATA (cs);
+      r = SCM_CHARSET_DATA (cs);
 
-      for (k = 0; k < LONGS_PER_CHARSET; k++)
-       {
-         q[k] |= p[k] & r[k];
-         p[k] &= ~r[k];
-       }
+      charsets_union (q, r);
+      charsets_intersection (p, r);
       rest = SCM_CDR (rest);
     }
   return scm_values (scm_list_2 (res1, res2));
@@ -1214,103 +1809,59 @@ SCM_DEFINE (scm_char_set_diff_plus_intersection, 
"char-set-diff+intersection", 1
 
 
 SCM_DEFINE (scm_char_set_complement_x, "char-set-complement!", 1, 0, 0,
-           (SCM cs),
-           "Return the complement of the character set @var{cs}.")
+            (SCM cs), "Return the complement of the character set @var{cs}.")
 #define FUNC_NAME s_scm_char_set_complement_x
 {
-  int k;
-  long * p;
-
   SCM_VALIDATE_SMOB (1, cs, charset);
-  p = (long *) SCM_SMOB_DATA (cs);
-  for (k = 0; k < LONGS_PER_CHARSET; k++)
-    p[k] = ~p[k];
+  cs = scm_char_set_complement (cs);
   return cs;
 }
+
 #undef FUNC_NAME
 
 
 SCM_DEFINE (scm_char_set_union_x, "char-set-union!", 1, 0, 1,
-           (SCM cs1, SCM rest),
-           "Return the union of all argument character sets.")
+            (SCM cs1, SCM rest),
+            "Return the union of all argument character sets.")
 #define FUNC_NAME s_scm_char_set_union_x
 {
-  int c = 2;
-  long * p;
-
   SCM_VALIDATE_SMOB (1, cs1, charset);
   SCM_VALIDATE_REST_ARGUMENT (rest);
 
-  p = (long *) SCM_SMOB_DATA (cs1);
-  while (!scm_is_null (rest))
-    {
-      int k;
-      SCM cs = SCM_CAR (rest);
-      SCM_VALIDATE_SMOB (c, cs, charset);
-      c++;
-      rest = SCM_CDR (rest);
-
-      for (k = 0; k < LONGS_PER_CHARSET; k++)
-       p[k] |= ((long *) SCM_SMOB_DATA (cs))[k];
-    }
+  cs1 = scm_char_set_union (scm_cons (cs1, rest));
   return cs1;
 }
+
 #undef FUNC_NAME
 
 
 SCM_DEFINE (scm_char_set_intersection_x, "char-set-intersection!", 1, 0, 1,
-           (SCM cs1, SCM rest),
-           "Return the intersection of all argument character sets.")
+            (SCM cs1, SCM rest),
+            "Return the intersection of all argument character sets.")
 #define FUNC_NAME s_scm_char_set_intersection_x
 {
-  int c = 2;
-  long * p;
-
   SCM_VALIDATE_SMOB (1, cs1, charset);
   SCM_VALIDATE_REST_ARGUMENT (rest);
 
-  p = (long *) SCM_SMOB_DATA (cs1);
-  while (!scm_is_null (rest))
-    {
-      int k;
-      SCM cs = SCM_CAR (rest);
-      SCM_VALIDATE_SMOB (c, cs, charset);
-      c++;
-      rest = SCM_CDR (rest);
-
-      for (k = 0; k < LONGS_PER_CHARSET; k++)
-       p[k] &= ((long *) SCM_SMOB_DATA (cs))[k];
-    }
+  cs1 = scm_char_set_intersection (scm_cons (cs1, rest));
   return cs1;
 }
+
 #undef FUNC_NAME
 
 
 SCM_DEFINE (scm_char_set_difference_x, "char-set-difference!", 1, 0, 1,
-           (SCM cs1, SCM rest),
-           "Return the difference of all argument character sets.")
+            (SCM cs1, SCM rest),
+            "Return the difference of all argument character sets.")
 #define FUNC_NAME s_scm_char_set_difference_x
 {
-  int c = 2;
-  long * p;
-
   SCM_VALIDATE_SMOB (1, cs1, charset);
   SCM_VALIDATE_REST_ARGUMENT (rest);
 
-  p = (long *) SCM_SMOB_DATA (cs1);
-  while (!scm_is_null (rest))
-    {
-      int k;
-      SCM cs = SCM_CAR (rest);
-      SCM_VALIDATE_SMOB (c, cs, charset);
-      c++;
-      rest = SCM_CDR (rest);
-
-      for (k = 0; k < LONGS_PER_CHARSET; k++)
-       p[k] &= ~((long *) SCM_SMOB_DATA (cs))[k];
-    }
+  cs1 = scm_char_set_difference (cs1, rest);
   return cs1;
 }
+
 #undef FUNC_NAME
 
 
@@ -1323,86 +1874,33 @@ SCM_DEFINE (scm_char_set_xor_x, "char-set-xor!", 1, 0, 
1,
      (define a (char-set #\a))
      (char-set-xor a a a) -> char set #\a
      (char-set-xor! a a a) -> char set #\a
-  */
+   */
   return scm_char_set_xor (scm_cons (cs1, rest));
-
-#if 0
-  /* this would give (char-set-xor! a a a) -> empty char set.  */
-  int c = 2;
-  long * p;
-
-  SCM_VALIDATE_SMOB (1, cs1, charset);
-  SCM_VALIDATE_REST_ARGUMENT (rest);
-
-  p = (long *) SCM_SMOB_DATA (cs1);
-  while (!scm_is_null (rest))
-    {
-      int k;
-      SCM cs = SCM_CAR (rest);
-      SCM_VALIDATE_SMOB (c, cs, charset);
-      c++;
-      rest = SCM_CDR (rest);
-
-      for (k = 0; k < LONGS_PER_CHARSET; k++)
-       p[k] ^= ((long *) SCM_SMOB_DATA (cs))[k];
-    }
-  return cs1;
-#endif
 }
+
 #undef FUNC_NAME
 
 
-SCM_DEFINE (scm_char_set_diff_plus_intersection_x, 
"char-set-diff+intersection!", 2, 0, 1,
-           (SCM cs1, SCM cs2, SCM rest),
-           "Return the difference and the intersection of all argument\n"
-           "character sets.")
+SCM_DEFINE (scm_char_set_diff_plus_intersection_x,
+            "char-set-diff+intersection!", 2, 0, 1, (SCM cs1, SCM cs2,
+                                                     SCM rest),
+            "Return the difference and the intersection of all argument\n"
+            "character sets.")
 #define FUNC_NAME s_scm_char_set_diff_plus_intersection_x
 {
-  int c = 3;
-  long * p, * q;
-  int k;
-
-  SCM_VALIDATE_SMOB (1, cs1, charset);
-  SCM_VALIDATE_SMOB (2, cs2, charset);
-  SCM_VALIDATE_REST_ARGUMENT (rest);
-
-  p = (long *) SCM_SMOB_DATA (cs1);
-  q = (long *) SCM_SMOB_DATA (cs2);
-  if (p == q)
-    {
-      /* (char-set-diff+intersection! a a ...): can't share storage,
-        but we know the answer without checking for further
-        arguments.  */
-      return scm_values (scm_list_2 (make_char_set (FUNC_NAME), cs1));
-    }
-  for (k = 0; k < LONGS_PER_CHARSET; k++)
-    {
-      long t = p[k];
-
-      p[k] &= ~q[k];
-      q[k] = t & q[k];
-    }
-  while (!scm_is_null (rest))
-    {
-      SCM cs = SCM_CAR (rest);
-      long *r;
-
-      SCM_VALIDATE_SMOB (c, cs, charset);
-      c++;
-      r = (long *) SCM_SMOB_DATA (cs);
+  SCM diff, intersect;
 
-      for (k = 0; k < LONGS_PER_CHARSET; k++)
-       {
-         q[k] |= p[k] & r[k];
-         p[k] &= ~r[k];
-       }
-      rest = SCM_CDR (rest);
-    }
+  diff = scm_char_set_difference (cs1, scm_cons (cs2, rest));
+  intersect =
+    scm_char_set_intersection (scm_cons (cs1, scm_cons (cs2, rest)));
+  cs1 = diff;
+  cs2 = intersect;
   return scm_values (scm_list_2 (cs1, cs2));
 }
-#undef FUNC_NAME
 
+#undef FUNC_NAME
 
+
 /* Standard character sets.  */
 
 SCM scm_char_set_lower_case;
@@ -1426,147 +1924,80 @@ SCM scm_char_set_full;
 
 /* Create an empty character set and return it after binding it to NAME.  */
 static inline SCM
-define_charset (const char *name)
+define_charset (const char *name, const scm_t_char_set *p)
 {
-  SCM cs = make_char_set (NULL);
+  SCM cs;
+
+  SCM_NEWSMOB (cs, scm_tc16_charset, p);
   scm_c_define (name, cs);
   return scm_permanent_object (cs);
 }
 
-/* Membership predicates for the various char sets.
-
-   XXX: The `punctuation' and `symbol' char sets have no direct equivalent in
-   <ctype.h>.  Thus, the predicates below yield correct results for ASCII,
-   but they do not provide the result described by the SRFI for Latin-1.  The
-   correct Latin-1 result could only be obtained by hard-coding the
-   characters listed by the SRFI, but the problem would remain for other
-   8-bit charsets.
-
-   Similarly, character 0xA0 in Latin-1 (unbreakable space, `#\0240') should
-   be part of `char-set:blank'.  However, glibc's current (2006/09) Latin-1
-   locales (which use the ISO 14652 "i18n" FDCC-set) do not consider it
-   `blank' so it ends up in `char-set:punctuation'.  */
-#ifdef HAVE_ISBLANK
-# define CSET_BLANK_PRED(c)  (isblank (c))
-#else
-# define CSET_BLANK_PRED(c)                    \
-   (((c) == ' ') || ((c) == '\t'))
-#endif
-
-#define CSET_SYMBOL_PRED(c)                                    \
-  (((c) != '\0') && (strchr ("$+<=>^`|~", (c)) != NULL))
-#define CSET_PUNCT_PRED(c)                                     \
-  ((ispunct (c)) && (!CSET_SYMBOL_PRED (c)))
-
-#define CSET_LOWER_PRED(c)       (islower (c))
-#define CSET_UPPER_PRED(c)       (isupper (c))
-#define CSET_LETTER_PRED(c)      (isalpha (c))
-#define CSET_DIGIT_PRED(c)       (isdigit (c))
-#define CSET_WHITESPACE_PRED(c)  (isspace (c))
-#define CSET_CONTROL_PRED(c)     (iscntrl (c))
-#define CSET_HEX_DIGIT_PRED(c)   (isxdigit (c))
-#define CSET_ASCII_PRED(c)       (isascii (c))
-
-/* Some char sets are explicitly defined by the SRFI as a union of other char
-   sets so we try to follow this closely.  */
-
-#define CSET_LETTER_AND_DIGIT_PRED(c)          \
-  (CSET_LETTER_PRED (c) || CSET_DIGIT_PRED (c))
-
-#define CSET_GRAPHIC_PRED(c)                           \
-  (CSET_LETTER_PRED (c) || CSET_DIGIT_PRED (c)         \
-   || CSET_PUNCT_PRED (c) || CSET_SYMBOL_PRED (c))
-
-#define CSET_PRINTING_PRED(c)                          \
-  (CSET_GRAPHIC_PRED (c) || CSET_WHITESPACE_PRED (c))
-
-/* False and true predicates.  */
-#define CSET_TRUE_PRED(c)    (1)
-#define CSET_FALSE_PRED(c)   (0)
-
-
-/* Compute the contents of all the standard character sets.  Computation may
-   need to be re-done at `setlocale'-time because some char sets (e.g.,
-   `char-set:letter') need to reflect the character set supported by Guile.
-
-   For instance, at startup time, the "C" locale is used, thus Guile supports
-   only ASCII; therefore, `char-set:letter' only contains English letters.
-   The user can change this by invoking `setlocale' and specifying a locale
-   with an 8-bit charset, thereby augmenting some of the SRFI-14 standard
-   character sets.
-
-   This works because some of the predicates used below to construct
-   character sets (e.g., `isalpha(3)') are locale-dependent (so
-   charset-dependent, though generally not language-dependent).  For details,
-   please see the `guile-devel' mailing list archive of September 2006.  */
-void
-scm_srfi_14_compute_char_sets (void)
+#ifdef SCM_CHARSET_DEBUG
+SCM_DEFINE (scm_debug_char_set, "debug-char-set", 1, 0, 0,
+            (SCM charset),
+            "Print out the internal C structure of @var{charset}.\n")
+#define FUNC_NAME s_debug_char_set
 {
-#define UPDATE_CSET(c, cset, pred)             \
-  do                                           \
-    {                                          \
-      if (pred (c))                            \
-       SCM_CHARSET_SET ((cset), (c));          \
-      else                                     \
-       SCM_CHARSET_UNSET ((cset), (c));        \
-    }                                          \
-  while (0)
-
-  register int ch;
-
-  for (ch = 0; ch < 256; ch++)
+  int i;
+  scm_t_char_set *cs = SCM_CHARSET_DATA (charset);
+  fprintf (stderr, "cs %p\n", cs);
+  fprintf (stderr, "len %d\n", cs->len);
+  fprintf (stderr, "arr %p\n", cs->ranges);
+  for (i = 0; i < cs->len; i++)
     {
-      UPDATE_CSET (ch, scm_char_set_upper_case, CSET_UPPER_PRED);
-      UPDATE_CSET (ch, scm_char_set_lower_case, CSET_LOWER_PRED);
-      UPDATE_CSET (ch, scm_char_set_title_case, CSET_FALSE_PRED);
-      UPDATE_CSET (ch, scm_char_set_letter, CSET_LETTER_PRED);
-      UPDATE_CSET (ch, scm_char_set_digit, CSET_DIGIT_PRED);
-      UPDATE_CSET (ch, scm_char_set_letter_and_digit,
-                  CSET_LETTER_AND_DIGIT_PRED);
-      UPDATE_CSET (ch, scm_char_set_graphic, CSET_GRAPHIC_PRED);
-      UPDATE_CSET (ch, scm_char_set_printing, CSET_PRINTING_PRED);
-      UPDATE_CSET (ch, scm_char_set_whitespace, CSET_WHITESPACE_PRED);
-      UPDATE_CSET (ch, scm_char_set_iso_control, CSET_CONTROL_PRED);
-      UPDATE_CSET (ch, scm_char_set_punctuation, CSET_PUNCT_PRED);
-      UPDATE_CSET (ch, scm_char_set_symbol, CSET_SYMBOL_PRED);
-      UPDATE_CSET (ch, scm_char_set_hex_digit, CSET_HEX_DIGIT_PRED);
-      UPDATE_CSET (ch, scm_char_set_blank, CSET_BLANK_PRED);
-      UPDATE_CSET (ch, scm_char_set_ascii, CSET_ASCII_PRED);
-      UPDATE_CSET (ch, scm_char_set_empty, CSET_FALSE_PRED);
-      UPDATE_CSET (ch, scm_char_set_full, CSET_TRUE_PRED);
+      if (cs->ranges[i].lo == cs->ranges[i].hi)
+        fprintf (stderr, "%04x\n", cs->ranges[i].lo);
+      else
+        fprintf (stderr, "%04x..%04x\t[%d]\n",
+                 cs->ranges[i].lo,
+                 cs->ranges[i].hi, cs->ranges[i].hi - cs->ranges[i].lo + 1);
     }
-
-#undef UPDATE_CSET
+  printf ("\n");
+  return SCM_UNSPECIFIED;
 }
 
+#undef FUNC_NAME
+#endif
 
+
+
 void
 scm_init_srfi_14 (void)
 {
-  scm_tc16_charset = scm_make_smob_type ("character-set",
-                                        BYTES_PER_CHARSET);
+  scm_tc16_charset = scm_make_smob_type ("character-set", 0);
   scm_set_smob_free (scm_tc16_charset, charset_free);
   scm_set_smob_print (scm_tc16_charset, charset_print);
 
-  scm_char_set_upper_case = define_charset ("char-set:upper-case");
-  scm_char_set_lower_case = define_charset ("char-set:lower-case");
-  scm_char_set_title_case = define_charset ("char-set:title-case");
-  scm_char_set_letter = define_charset ("char-set:letter");
-  scm_char_set_digit = define_charset ("char-set:digit");
-  scm_char_set_letter_and_digit = define_charset ("char-set:letter+digit");
-  scm_char_set_graphic = define_charset ("char-set:graphic");
-  scm_char_set_printing = define_charset ("char-set:printing");
-  scm_char_set_whitespace = define_charset ("char-set:whitespace");
-  scm_char_set_iso_control = define_charset ("char-set:iso-control");
-  scm_char_set_punctuation = define_charset ("char-set:punctuation");
-  scm_char_set_symbol = define_charset ("char-set:symbol");
-  scm_char_set_hex_digit = define_charset ("char-set:hex-digit");
-  scm_char_set_blank = define_charset ("char-set:blank");
-  scm_char_set_ascii = define_charset ("char-set:ascii");
-  scm_char_set_empty = define_charset ("char-set:empty");
-  scm_char_set_full = define_charset ("char-set:full");
-
-  scm_srfi_14_compute_char_sets ();
+  scm_tc16_charset_cursor = scm_make_smob_type ("char-set-cursor", 0);
+  scm_set_smob_free (scm_tc16_charset_cursor, charset_cursor_free);
+  scm_set_smob_print (scm_tc16_charset_cursor, charset_cursor_print);
+
+  scm_char_set_upper_case =
+    define_charset ("char-set:upper-case", &cs_upper_case);
+  scm_char_set_lower_case =
+    define_charset ("char-set:lower-case", &cs_lower_case);
+  scm_char_set_title_case =
+    define_charset ("char-set:title-case", &cs_title_case);
+  scm_char_set_letter = define_charset ("char-set:letter", &cs_letter);
+  scm_char_set_digit = define_charset ("char-set:digit", &cs_digit);
+  scm_char_set_letter_and_digit =
+    define_charset ("char-set:letter+digit", &cs_letter_plus_digit);
+  scm_char_set_graphic = define_charset ("char-set:graphic", &cs_graphic);
+  scm_char_set_printing = define_charset ("char-set:printing", &cs_printing);
+  scm_char_set_whitespace =
+    define_charset ("char-set:whitespace", &cs_whitespace);
+  scm_char_set_iso_control =
+    define_charset ("char-set:iso-control", &cs_iso_control);
+  scm_char_set_punctuation =
+    define_charset ("char-set:punctuation", &cs_punctuation);
+  scm_char_set_symbol = define_charset ("char-set:symbol", &cs_symbol);
+  scm_char_set_hex_digit =
+    define_charset ("char-set:hex-digit", &cs_hex_digit);
+  scm_char_set_blank = define_charset ("char-set:blank", &cs_blank);
+  scm_char_set_ascii = define_charset ("char-set:ascii", &cs_ascii);
+  scm_char_set_empty = define_charset ("char-set:empty", &cs_empty);
+  scm_char_set_full = define_charset ("char-set:full", &cs_full);
 
 #include "libguile/srfi-14.x"
 }
diff --git a/libguile/srfi-14.h b/libguile/srfi-14.h
index 54e0d32..bdcdd08 100644
--- a/libguile/srfi-14.h
+++ b/libguile/srfi-14.h
@@ -24,22 +24,34 @@
 
 #include "libguile/__scm.h"
 
-#define SCM_CHARSET_SIZE 256
+typedef struct
+{
+  scm_t_wchar lo;
+  scm_t_wchar hi;
+} scm_t_char_range;
 
-/* We expect 8-bit bytes here.  Should be no problem in the year
-   2001.  */
-#ifndef SCM_BITS_PER_LONG
-# define SCM_BITS_PER_LONG (sizeof (long) * 8)
-#endif
+typedef struct 
+{
+  size_t len;
+  scm_t_char_range *ranges;
+} scm_t_char_set;
+
+typedef struct
+{
+  size_t range;
+  scm_t_wchar n;
+} scm_t_char_set_cursor;
 
-#define SCM_CHARSET_GET(cs, idx) (((long *) SCM_SMOB_DATA (cs))\
-                                  [((unsigned char) (idx)) / 
SCM_BITS_PER_LONG] &\
-                                  (1L << (((unsigned char) (idx)) % 
SCM_BITS_PER_LONG)))
+#define SCM_CHARSET_GET(cs,idx)                                 \
+  scm_i_charset_get((scm_t_char_set *)SCM_SMOB_DATA(cs),idx)
 
 #define SCM_CHARSETP(x) (!SCM_IMP (x) && (SCM_TYP16 (x) == scm_tc16_charset))
 
 /* Smob type code for character sets.  */
 SCM_API int scm_tc16_charset;
+SCM_INTERNAL int scm_i_charset_get (scm_t_char_set *cs, scm_t_wchar n);
+SCM_INTERNAL void scm_i_charset_set (scm_t_char_set *cs, scm_t_wchar n);
+SCM_INTERNAL void scm_i_charset_unset (scm_t_char_set *cs, scm_t_wchar n);
 
 SCM_API SCM scm_char_set_p (SCM obj);
 SCM_API SCM scm_char_set_eq (SCM char_sets);
@@ -88,6 +100,9 @@ SCM_API SCM scm_char_set_intersection_x (SCM cs1, SCM rest);
 SCM_API SCM scm_char_set_difference_x (SCM cs1, SCM rest);
 SCM_API SCM scm_char_set_xor_x (SCM cs1, SCM rest);
 SCM_API SCM scm_char_set_diff_plus_intersection_x (SCM cs1, SCM cs2, SCM rest);
+#if SCM_CHARSET_DEBUG
+SCM_API SCM scm_debug_char_set (SCM cs);
+#endif
 
 SCM_API SCM scm_char_set_lower_case;
 SCM_API SCM scm_char_set_upper_case;
@@ -107,7 +122,6 @@ SCM_API SCM scm_char_set_ascii;
 SCM_API SCM scm_char_set_empty;
 SCM_API SCM scm_char_set_full;
 
-SCM_INTERNAL void scm_srfi_14_compute_char_sets (void);
 SCM_INTERNAL void scm_init_srfi_14 (void);
 
 #endif /* SCM_SRFI_14_H */
diff --git a/test-suite/tests/srfi-14.test b/test-suite/tests/srfi-14.test
index 8c678cd..2bb934a 100644
--- a/test-suite/tests/srfi-14.test
+++ b/test-suite/tests/srfi-14.test
@@ -1,4 +1,5 @@
-;;;; srfi-14.test --- Test suite for Guile's SRFI-14 functions.
+;;;; srfi-14.test          -*- mode:scheme; coding: iso-8859-1 -*-
+;;;; --- Test suite for Guile's SRFI-14 functions.
 ;;;; Martin Grabmueller, 2001-07-16
 ;;;;
 ;;;; Copyright (C) 2001, 2006 Free Software Foundation, Inc.
@@ -29,6 +30,30 @@
 (define exception:non-char-return
   (cons 'misc-error "returned non-char"))
 
+
+(with-test-prefix "char set contents"
+
+  (pass-if "empty set"
+    (list= eqv? 
+           (char-set->list (char-set))
+           '()))
+
+  (pass-if "single char"
+    (list= eqv?
+           (char-set->list (char-set #\a))
+           (list #\a)))
+
+  (pass-if "contiguous chars"
+    (list= eqv?
+           (char-set->list (char-set #\a #\b #\c))
+           (list #\a #\b #\c))) 
+
+  (pass-if "discontiguous chars"
+    (list= eqv?
+           (char-set->list (char-set #\a #\c #\e))
+           (list #\a #\c #\e))))
+          
+
 (with-test-prefix "char-set?"
 
   (pass-if "success on empty set"
@@ -113,7 +138,7 @@
 (with-test-prefix "char-set cursor"
 
   (pass-if-exception "invalid character cursor" 
-     exception:invalid-char-set-cursor
+     exception:wrong-type-arg
      (let* ((cs (char-set #\B #\r #\a #\z))
            (cc (char-set-cursor cs)))
        (char-set-ref cs 1000)))
@@ -148,30 +173,33 @@
      (= (char-set-size (char-set-fold (lambda (c cs) (char-set-adjoin cs c)) 
                                      (char-set) (char-set #\a #\b))) 2)))
 
+(define char-set:256 
+  (string->char-set (apply string (map integer->char (iota 256)))))
+
 (with-test-prefix "char-set-unfold"
 
   (pass-if "create char set"
-     (char-set= char-set:full
+     (char-set= char-set:256
                (char-set-unfold (lambda (s) (= s 256)) integer->char
                                 (lambda (s) (+ s 1)) 0)))
   (pass-if "create char set (base set)"
-     (char-set= char-set:full
+     (char-set= char-set:256
                (char-set-unfold (lambda (s) (= s 256)) integer->char
                                 (lambda (s) (+ s 1)) 0 char-set:empty))))
 
 (with-test-prefix "char-set-unfold!"
 
   (pass-if "create char set"
-     (char-set= char-set:full
+     (char-set= char-set:256
                (char-set-unfold! (lambda (s) (= s 256)) integer->char
                                 (lambda (s) (+ s 1)) 0
                                 (char-set-copy char-set:empty))))
 
   (pass-if "create char set"
-     (char-set= char-set:full
+     (char-set= char-set:256
                (char-set-unfold! (lambda (s) (= s 32)) integer->char
                                 (lambda (s) (+ s 1)) 0
-                                (char-set-copy char-set:full)))))
+                                (char-set-copy char-set:256)))))
 
 
 (with-test-prefix "char-set-for-each"
@@ -186,9 +214,15 @@
 
 (with-test-prefix "char-set-map"
 
-  (pass-if "upper case char set"
-     (char-set= (char-set-map char-upcase char-set:lower-case)
-               char-set:upper-case)))
+  (pass-if "upper case char set 1"
+     (char-set= (char-set-map char-upcase 
+                              (string->char-set "abcdefghijklmnopqrstuvwxyz"))
+                (string->char-set "ABCDEFGHIJKLMNOPQRSTUVWXYZ")))
+
+  (pass-if "upper case char set 2"
+     (char-set= (char-set-map char-upcase 
+                              (string->char-set 
"àáâãäåæçèéêëìíîïñòóôõöøùúûüýþ"))
+                (string->char-set "ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÑÒÓÔÕÖØÙÚÛÜÝÞ"))))
 
 (with-test-prefix "string->char-set"
 
@@ -197,42 +231,107 @@
        (char-set= (list->char-set chars)
                  (string->char-set (apply string chars))))))
 
+(with-test-prefix "char-set->string"
+
+  (pass-if "some char set"
+     (let ((cs (char-set #\g #\u #\i #\l #\e)))
+       (string=? (char-set->string cs)
+                 "egilu"))))
+
 ;; Make sure we get an ASCII charset and character classification.
 (if (defined? 'setlocale) (setlocale LC_CTYPE "C"))
 
 (with-test-prefix "standard char sets (ASCII)"
 
+  (pass-if "char-set:lower-case"
+     (char-set<= (string->char-set "abcdefghijklmnopqrstuvwxyz")
+                 char-set:lower-case))
+
+  (pass-if "char-set:upper-case"
+     (char-set<= (string->char-set "ABCDEFGHIJKLMNOPQRSTUVWXYZ")
+                 char-set:upper-case))
+
+  (pass-if "char-set:title-case"
+     (char-set<= (string->char-set "")
+                 char-set:title-case))
+
   (pass-if "char-set:letter"
-     (char-set= (string->char-set
-                (string-append "abcdefghijklmnopqrstuvwxyz"
-                               "ABCDEFGHIJKLMNOPQRSTUVWXYZ"))
-               char-set:letter))
+     (char-set<= (char-set-union
+                  (string->char-set "abcdefghijklmnopqrstuvwxyz")
+                  (string->char-set "ABCDEFGHIJKLMNOPQRSTUVWXYZ"))
+                 char-set:letter))
 
-  (pass-if "char-set:punctuation"
-     (char-set= (string->char-set "!\"#%&'()*,-./:;address@hidden")
-               char-set:punctuation))
+  (pass-if "char-set:digit"
+     (char-set<= (string->char-set "0123456789")
+                 char-set:digit))
 
-  (pass-if "char-set:symbol"
-     (char-set= (string->char-set "$+<=>^`|~")
-               char-set:symbol))
+  (pass-if "char-set:hex-digit"
+     (char-set<= (string->char-set "0123456789abcdefABCDEF")
+                 char-set:hex-digit))
 
   (pass-if "char-set:letter+digit"
-     (char-set= char-set:letter+digit
-                (char-set-union char-set:letter char-set:digit)))
+     (char-set<= (char-set-union
+                  (string->char-set "abcdefghijklmnopqrstuvwxyz")
+                  (string->char-set "ABCDEFGHIJKLMNOPQRSTUVWXYZ")
+                  (string->char-set "0123456789"))
+                 char-set:letter+digit))
 
-  (pass-if "char-set:graphic"
-     (char-set= char-set:graphic
-                (char-set-union char-set:letter char-set:digit
-                                char-set:punctuation char-set:symbol)))
+  (pass-if "char-set:punctuation"
+     (char-set<= (string->char-set "!\"#%&'()*,-./:;address@hidden")
+                 char-set:punctuation))
 
-  (pass-if "char-set:printing"
-      (char-set= char-set:printing
-                 (char-set-union char-set:whitespace char-set:graphic))))
+  (pass-if "char-set:symbol"
+     (char-set<= (string->char-set "$+<=>^`|~")
+                 char-set:symbol))
 
+  (pass-if "char-set:graphic"
+     (char-set<= (char-set-union
+                  (string->char-set "abcdefghijklmnopqrstuvwxyz")
+                  (string->char-set "ABCDEFGHIJKLMNOPQRSTUVWXYZ")
+                  (string->char-set "0123456789")
+                  (string->char-set "!\"#%&'()*,-./:;address@hidden")
+                  (string->char-set "$+<=>^`|~"))
+                 char-set:graphic))
+
+  (pass-if "char-set:whitespace"
+     (char-set<= (string->char-set 
+                  (string
+                   (integer->char #x09)
+                   (integer->char #x0a)
+                   (integer->char #x0b)
+                   (integer->char #x0c)
+                   (integer->char #x0d)
+                   (integer->char #x20)))
+                 char-set:whitespace))
+                                  
+  (pass-if "char-set:printing"
+     (char-set<= (char-set-union
+                  (string->char-set "abcdefghijklmnopqrstuvwxyz")
+                  (string->char-set "ABCDEFGHIJKLMNOPQRSTUVWXYZ")
+                  (string->char-set "0123456789")
+                  (string->char-set "!\"#%&'()*,-./:;address@hidden")
+                  (string->char-set "$+<=>^`|~")
+                  (string->char-set (string
+                                     (integer->char #x09)
+                                     (integer->char #x0a)
+                                     (integer->char #x0b)
+                                     (integer->char #x0c)
+                                     (integer->char #x0d)
+                                     (integer->char #x20))))
+                 char-set:printing))
+
+  (pass-if "char-set:iso-control"
+     (char-set<= (string->char-set 
+                  (apply string 
+                         (map integer->char (append 
+                                             ;; U+0000 to U+001F
+                                             (iota #x20)
+                                             (list #x7f)))))
+                 char-set:iso-control)))
 
 
 ;;;
-;;; 8-bit charsets.
+;;; Non-ASCII codepoints
 ;;;
 ;;; Here, we only test ISO-8859-1 (Latin-1), notably because behavior of
 ;;; SRFI-14 for implementations supporting this charset is well-defined.
@@ -260,57 +359,123 @@
 
 (with-test-prefix "Latin-1 (8-bit charset)"
 
-  ;; Note: the membership tests below are not exhaustive.
-
-  (pass-if "char-set:letter (membership)"
+  (pass-if "char-set:lower-case"
      (if (not %latin1)
         (throw 'unresolved)
-        (let ((letters (char-set->list char-set:letter)))
-          (every? (lambda (8-bit-char)
-                    (memq 8-bit-char letters))
-                  (append '(#\a #\b #\c)             ;; ASCII
-                          (string->list "çéèâùÉÀÈÊ") ;; French
-                          (string->list "øñÑíßåæðþ"))))))
-
-  (pass-if "char-set:letter (size)"
+         (char-set<= (string->char-set
+                      (string-append "abcdefghijklmnopqrstuvwxyz"
+                                     "µßàáâãäåæçèéêëìíîïðñòóôõöøùúûüýþÿ")
+                      char-set:lower-case))))
+
+  (pass-if "char-set:upper-case"
      (if (not %latin1)
-        (throw 'unresolved)
-        (= (char-set-size char-set:letter) 117)))
+         (throw 'unresolved)
+         (char-set<= (string->char-set
+                      (string-append "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
+                                     "ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞ")
+                      char-set:lower-case))))
 
-  (pass-if "char-set:lower-case (size)"
+  (pass-if "char-set:title-case"
      (if (not %latin1)
         (throw 'unresolved)
-        (= (char-set-size char-set:lower-case) (+ 26 33))))
+         (char-set<= (string->char-set "")
+                     char-set:title-case)))
 
-  (pass-if "char-set:upper-case (size)"
+  (pass-if "char-set:letter"
+     (if (not %latin1)
+        (throw 'unresolved)
+         (char-set<= (string->char-set
+                      (string-append 
+                       ;; Lowercase
+                       "abcdefghijklmnopqrstuvwxyz" 
+                       "µßàáâãäåæçèéêëìíîïðñòóôõöøùúûüýþÿ"
+                       ;; Uppercase
+                       "ABCDEFGHIJKLMNOPQRSTUVWXYZ" 
+                       "ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞ"
+                       ;; Uncased
+                       "ªº")) 
+                     char-set:letter)))
+  
+  (pass-if "char-set:digit"
      (if (not %latin1)
         (throw 'unresolved)
-        (= (char-set-size char-set:upper-case) (+ 26 30))))
+         (char-set<= (string->char-set "0123456789")
+                     char-set:digit)))
 
-  (pass-if "char-set:punctuation (membership)"
+  (pass-if "char-set:hex-digit"
      (if (not %latin1)
         (throw 'unresolved)
-        (let ((punctuation (char-set->list char-set:punctuation)))
-          (every? (lambda (8-bit-char)
-                    (memq 8-bit-char punctuation))
-                  (append '(#\! #\. #\?)            ;; ASCII
-                          (string->list "¡¿")       ;; Castellano
-                          (string->list "«»"))))))  ;; French
+         (char-set<= (string->char-set "0123456789abcdefABCDEF")
+                     char-set:hex-digit)))
 
   (pass-if "char-set:letter+digit"
-     (char-set= char-set:letter+digit
-                (char-set-union char-set:letter char-set:digit)))
+     (if (not %latin1)
+        (throw 'unresolved)
+         (char-set<= (char-set-union
+                      char-set:letter
+                      char-set:digit)
+                     char-set:letter+digit)))
 
-  (pass-if "char-set:graphic"
-     (char-set= char-set:graphic
-                (char-set-union char-set:letter char-set:digit
-                                char-set:punctuation char-set:symbol)))
+  (pass-if "char-set:punctuation"
+     (if (not %latin1)
+        (throw 'unresolved)
+         (char-set<= (string->char-set 
+                      (string-append "!\"#%&'()*,-./:;address@hidden"
+                                     "¡«·»¿"))
+                     char-set:punctuation)))
 
+  (pass-if "char-set:symbol"
+     (if (not %latin1)
+        (throw 'unresolved)
+         (char-set<= (string->char-set 
+                      (string-append "$+<=>^`|~"
+                                     "¢£¤¥¦§¨©¬®¯°±´¶¸×÷"))
+                     char-set:symbol)))
+
+  ;; Note that SRFI-14 itself is inconsistent here.  Characters that
+  ;; are non-digit numbers (such as category No) are clearly 'graphic'
+  ;; but don't occur in the letter, digit, punct, or symbol charsets.
+  (pass-if "char-set:graphic"
+     (if (not %latin1)
+        (throw 'unresolved)
+         (char-set<= (char-set-union
+                      char-set:letter
+                      char-set:digit
+                      char-set:punctuation
+                      char-set:symbol)
+                     char-set:graphic)))
+
+  (pass-if "char-set:whitespace"
+     (if (not %latin1)
+        (throw 'unresolved)
+         (char-set<= (string->char-set 
+                      (string
+                       (integer->char #x09)
+                       (integer->char #x0a)
+                       (integer->char #x0b)
+                       (integer->char #x0c)
+                       (integer->char #x0d)
+                       (integer->char #x20)
+                       (integer->char #xa0)))
+                     char-set:whitespace)))
+                                  
   (pass-if "char-set:printing"
-     (char-set= char-set:printing
-                (char-set-union char-set:whitespace char-set:graphic))))
+     (if (not %latin1)
+        (throw 'unresolved)
+         (char-set<= (char-set-union char-set:graphic char-set:whitespace)
+                     char-set:printing)))
+
+  (pass-if "char-set:iso-control"
+     (if (not %latin1)
+        (throw 'unresolved)
+         (char-set<= (string->char-set 
+                      (apply string 
+                             (map integer->char (append 
+                                                 ;; U+0000 to U+001F
+                                                 (iota #x20)
+                                                 (list #x7f)
+                                                 ;; U+007F to U+009F
+                                                 (map (lambda (x) (+ #x80 x))
+                                                      (iota #x20))))))
+                     char-set:iso-control))))
 
-;; Local Variables:
-;; mode: scheme
-;; coding: latin-1
-;; End:


hooks/post-receive
-- 
GNU Guile




reply via email to

[Prev in Thread] Current Thread [Next in Thread]