[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Patch] SRFI-13 string-tokenize is wrong
From: |
Matthias Koeppe |
Subject: |
Re: [Patch] SRFI-13 string-tokenize is wrong |
Date: |
Mon, 29 Apr 2002 11:21:14 +0200 |
User-agent: |
Gnus/5.090004 (Oort Gnus v0.04) Emacs/21.1.80 (sparc-sun-solaris2.7) |
Marius Vollmer <address@hidden> writes:
> Thanks; and sorry for being nitpicky: can we be sure that isgraphic is
> the same as charset:graphic?
We can't. That's why I wrote that TOKEN_SET defaults to "an
equivalent" of CHAR-SET:GRAPHIC.
The whole internationalization stuff is, of course, broken. Some
Guile functions depend on the current locale setting; others depend on
the locale setting at load time; others silently do ASCII only. This
clearly needs to be worked on, but I don't think STRING-TOKENIZE would
be the place to start.
BTW, when I tried to make an example of the described behavior, I got
a segmentation fault caused by an array being indexed by a signed
char (on Solaris 2.7 with the Forte compiler):
(use-modules (srfi srfi-13) (srfi srfi-14))
(string-tokenize "charsetsäareäfun" char-set:graphic)
==> segfault
Here is a fix:
--- srfi-14.h.~1.3.2.6.~ Tue Sep 25 13:00:41 2001
+++ srfi-14.h Mon Apr 29 11:13:03 2002
@@ -48,15 +48,15 @@
#define SCM_CHARSET_SIZE 256
-/* We expect 8-bit bytes here. Shoule be no problem in the year
+/* We expect 8-bit bytes here. Should be no problem in the year
2001. */
#ifndef SCM_BITS_PER_LONG
# define SCM_BITS_PER_LONG (sizeof (long) * 8)
#endif
#define SCM_CHARSET_GET(cs, idx) (((long *) SCM_SMOB_DATA (cs))\
- [(idx) / SCM_BITS_PER_LONG] &\
- (1L << ((idx) % SCM_BITS_PER_LONG)))
+ [((unsigned char) (idx)) /
SCM_BITS_PER_LONG] &\
+ (1L << (((unsigned char) (idx)) %
SCM_BITS_PER_LONG)))
#define SCM_CHARSETP(x) (!SCM_IMP (x) && (SCM_TYP16 (x) == scm_tc16_charset))
--
Matthias Köppe -- http://www.math.uni-magdeburg.de/~mkoeppe